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Chapter 1 

Overview of Wireless Communications 



Wireless communications is, by any measure, the fastest growing segment of the communications industry. As 
such, it has captured the attention of the media and the imagination of the public. Cellular systems have experi- 
enced exponential growth over the last decade and there arc currently around two billion users worldwide. Indeed, 
cellular phones have become a critical business tool and paid of everyday life in most developed countries, and 
arc rapidly supplanting antiquated wireline systems in many developing countries. In addition, wireless local area 
networks currently supplement or replace wired networks in many homes, businesses, and campuses. Many new 
applications, including wireless sensor networks, automated highways and factories, smart homes and appliances, 
and remote telemedicine, arc emerging from research ideas to concrete systems. The explosive growth of wire- 
less systems coupled with the proliferation of laptop and palmtop computers indicate a bright future for wireless 
networks, both as stand-alone systems and as paid of the larger networking infrastructure. However, many tech- 
nical challenges remain in designing robust wireless networks that deliver the performance necessary to support 
emerging applications. In this introductory chapter we will briefly review the history of wireless networks, from 
the smoke signals of the pre-industrial age to the cellular, satellite, and other wireless networks of today. We 
then discuss the wireless vision in more detail, including the technical challenges that must be overcome to make 
this vision a reality. We describe current wireless systems along with emerging systems and standards. The gap 
between current and emerging systems and the vision for future wireless applications indicates that much work 
remains to be done to make this vision a reality. 

1.1 History of Wireless Communications 

The first wireless networks were developed in the Pre-industrial age. These systems transmitted information over 
line-of-sight distances (later extended by telescopes) using smoke signals, torch signaling, flashing mirrors, signal 
Hares, or semaphore flags. An elaborate set of signal combinations was developed to convey complex messages 
with these rudimentary signals. Observation stations were built on hilltops and along roads to relay these messages 
over large distances. These early communication networks were replaced first by the telegraph network (invented 
by Samuel Morse in 1838) and later by the telephone. In 1895, a few decades after the telephone was invented, 
Marconi demonstrated the first radio transmission from the Isle of Wight to a tugboat 18 miles away, and radio 
communications was born. Radio technology advanced rapidly to enable transmissions over larger distances with 
better quality, less power, and smaller, cheaper devices, thereby enabling public and private radio communications, 
television, and wireless networking. 

Early radio systems transmitted analog signals. Today most radio systems transmit digital signals composed 
of binary bits, where the bits arc obtained directly from a data signal or by digitizing an analog signal. A digital 
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radio can transmit a continuous bit stream or it can group the bits into packets. The latter type of radio is called 
a packet radio and is characterized by bursty transmissions: the radio is idle except when it transmits a packet. 
The first network based on packet radio, ALOHANET, was developed at the University of Hawaii in 1971. This 
network enabled computer sites at seven campuses spread out over four islands to communicate with a central 
computer on Oahu via radio transmission. The network architecture used a star topology with the central computer 
at its hub. Any two computers could establish a bi-directional communications link between them by going through 
the central hub. ALOHANET incorporated the first set of protocols for channel access and routing in packet radio 
systems, and many of the underlying principles in these protocols arc still in use today. The U.S. military was ex- 
tremely interested in the combination of packet data and broadcast radio inherent to ALOHANET. Throughout the 
1970's and early 1980's the Defense Advanced Research Projects Agency (DARPA) invested significant resources 
to develop networks using packet radios for tactical communications in the battlefield. The nodes in these ad hoc 
wireless networks had the ability to self-conhgure (or reconfigure) into a network without the aid of any established 
infrastructure. DARPA’s investment in ad hoc networks peaked in the mid 1980’s, but the resulting networks fell 
far short of expectations in terms of speed and performance. These networks continue to be developed for mili- 
tary use. Packet radio networks also found commercial application in supporting w ide-area wireless data services. 
These services, first introduced in the early 1990’s, enable wireless data access (including email, hie transfer, and 
web browsing) at fairly low speeds, on the order of 20 Kbps. A strong market for these wide-area wireless data 
services never really materialized, due mainly to their low data rates, high cost, and lack of “killer applications”. 
These services mostly disappeared in the 1990s, supplanted by the wireless data capabilities of cellular telephones 
and wireless local area networks (LANs). 

The introduction of wired Ethernet technology in the 1970's steered many commercial companies away from 
radio-based networking. Ethernet’s 10 Mbps data rate far exceeded anything available using radio, and companies 
did not mind running cables within and between their facilities to take advantage of these high rates. In 1985 the 
Federal Communications Commission (FCC) enabled the commercial development of wireless LANs by autho- 
rizing the public use of the Industrial, Scientific, and Medical (ISM) frequency bands for wireless LAN products. 
The ISM band was very attractive to wireless LAN vendors since they did not need to obtain an FCC license to 
operate in this band. However, the wireless LAN systems could not interfere with the primary ISM band users, 
which forced them to use a low power profile and an inefficient signaling scheme. Moreover, the interference from 
primary users within this frequency band was quite high. As a result these initial wireless LANs had very poor 
performance in terms of data rates and coverage. This poor performance, coupled with concerns about security, 
lack of standardization, and high cost (the first wireless LAN access points listed for $1,400 as compared to a few 
hundred dollars for a wired Ethernet card) resulted in weak sales. Few of these systems were actually used for data 
networking: they were relegated to low-tech applications like inventory control. The current generation of wireless 
LANs, based on the family of IEEE 802.11 standards, have better performance, although the data rates arc still 
relatively low (maximum collective data rates of tens of Mbps) and the coverage area is still small (around 150 m.). 
Wired Ethernets today offer data rates of 100 Mbps, and the performance gap between wired and wireless LANs is 
likely to increase over time without additional spectrum allocation. Despite the big data rate differences, wireless 
LANs arc becoming the prefered Internet access method in many homes, offices, and campus environments due to 
their convenience and freedom from wires. However, most wireless LANs support applications such as email and 
web browsing that are not bandwidth-intensive. The challenge for future wireless LANs will be to support many 
users simultaneously with bandwidth-intensive and delay-constrained applications such as video. Range extension 
is also a critical goal for future wireless LAN systems. 

By far the most successful application of wireless networking has been the cellular telephone system. The 
roots of this system began in 1915, when wireless voice transmission between New York and San Francisco was 
first established. In 1946 public mobile telephone service was introduced in 25 cities across the United States. 
These initial systems used a central transmitter to cover an entire metropolitan area. This inefficient use of the 



2 




radio spectrum coupled with the state of radio technology at that time severely limited the system capacity: thirty 
years after the introduction of mobile telephone service the New York system could only support 543 users. 

A solution to this capacity problem emerged during the 50's and 60’s when researchers at AT&T Bell Labo- 
ratories developed the cellular concept [4] . Cellular systems exploit the fact that the power of a transmitted signal 
falls off with distance. Thus, two users can operate on the same frequency at spatially-separate locations with 
minimal interference between them. This allows very efficient use of cellular spectrum so that a large number of 
users can be accommodated. The evolution of cellular systems from initial concept to implementation was glacial. 
In 1947 AT&T requested spectrum for cellular service from the FCC. The design was mostly completed by the end 
of the 1960’s, the first field test was in 1978, and the FCC granted service authorization in 1982, by which time 
much of the original technology was out-of-date. The first analog cellular system deployed in Chicago in 1983 
was already saturated by 1984, at which point the FCC increased the cellular spectral allocation from 40 MHz to 
50 MHz. The explosive growth of the cellular industry took almost everyone by surprise. In fact a marketing study 
commissioned by AT&T before the first system rollout predicted that demand for cellular phones would be limited 
to doctors and the very rich. AT&T basically abandoned the cellular business in the 1980's focus on fiber optic 
networks, eventually returning to the business after its potential became apparent. Throughout the late 1980’s, 
as more and more cities became saturated with demand for cellular service, the development of digital cellular 
technology for increased capacity and better performance became essential. 

The second generation of cellular systems, first deployed in the early 1990's, were based on digital commu- 
nications. The shift from analog to digital was driven by its higher capacity and the improved cost, speed, and 
power efficiency of digital hardware. While second generation cellular systems initially provided mainly voice 
services, these systems gradually evolved to support data services such as email, Internet access, and short mes- 
saging. Unfortunately, the great market potential for cellular phones led to a proliferation of second generation 
cellular standards: three different standards in the U.S. alone, and other standards in Europe and Japan, all incom- 
patible. The fact that different cities have different incompatible standards makes roaming throughout the U.S. 
and the world using one cellular phone standard impossible. Moreover, some countries have initiated service for 
third generation systems, for which there arc also multiple incompatible standards. As a result of the standards 
proliferation, many cellular phones today arc multi-mode: they incorporate multiple digital standards to facili- 
ate nationwide and worldwide roaming, and possibly the first generation analog standard as well, since only this 
standard provides universal coverage throughout the U.S. 

Satellite systems arc typically characterized by the height of the satellite orbit, low-earth orbit (LEOs at 
roughly 2000 Km. altitude), medium-earth orbit (MEOs at roughly 9000 Km. altitude), or geosynchronous orbit 
(GEOs at roughly 40,000 Km. altitude). The geosynchronous orbits arc seen as stationary from the earth, whereas 
the satellites with other orbits have their coverage area change over time. The concept of using geosynchronous 
satellites for communications was first suggested by the science fiction writer Arthur C. Clarke in 1945. However, 
the first deployed satellites, the Soviet Union’s Sputnik in 1957 and the NASA/Bell Laboratories’ Echo-1 in 1960, 
were not geosynchronous due to the difficulty of lifting a satellite into such a high orbit. The first GEO satellite 
was launched by Hughes and NASA in 1963. GEOs then dominated both commercial and government satellite 
systems for several decades. 

Geosynchronous satellites have large coverage areas, so fewer satellites (and dollars) arc necessary to provide 
w ide-area or global coverage. However, it takes a great deal of power to reach the satellite, and the propagation 
delay is typically too large for delay-constrained applications like voice. These disadvantages caused a shift in 
the 1990's towards lower orbit satellites [6, 7]. The goal was to provide voice and data service competetive with 
cellular systems. However, the satellite mobile terminals were much bigger, consumed much more power, and 
cost much more than contemporary cellular phones, which limited their appeal. The most compelling feature of 
these systems is their ubiquitous worldwide coverage, especially in remote areas or third-world countries with no 
landline or cellular system infrastructure. Unfortunately, such places do not typically have large demand or the 
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resources the pay for satellite service either. As cellular systems became more widespread, they took away most 
revenue that LEO systems might have generated in populated areas. With no real market left, most LEO satellite 
systems went out of business. 

A natural area for satellite systems is broadcast entertainment. Direct broadcast satellites operate in the 12 
GHz frequency band. These systems offer hundreds of TV channels and are major competitors to cable. Satellite- 
delivered digital radio has also become popular. These systems, operating in both Europe and the US, offer digital 
audio broadcasts at near-CD quality. 

1.2 Wireless Vision 

The vision of wireless communications supporting information exchange between people or devices is the com- 
munications frontier of the next few decades, and much of it already exists in some form. This vision will allow 
multimedia communication from anywhere in the world using a small handheld device or laptop. Wireless net- 
works will connect palmtop, laptop, and desktop computers anywhere within an office building or campus, as well 
as from the corner cafe. In the home these networks will enable a new class of intelligent electronic devices that 
can interact with each other and with the Internet in addition to providing connectivity between computers, phones, 
and security/monitoring systems. Such smart homes can also help the elderly and disabled with assisted living, 
patient monitoring, and emergency response. Wireless entertainment will permeate the home and any place that 
people congregate. Video teleconferencing will take place between buildings that arc blocks or continents apart, 
and these conferences can include travelers as well, from the salesperson who missed his plane connection to the 
CEO off sailing in the Caribbean. Wireless video will enable remote classrooms, remote training facilities, and 
remote hospitals anywhere in the world. Wireless sensors have an enormous range of both commercial and military 
applications. Commercial applications include monitoring of fire hazards, hazardous waste sites, stress and strain 
in buildings and bridges, carbon dioxide movement and the spread of chemicals and gasses at a disaster site. These 
wireless sensors self-configure into a network to process and interpret sensor measurements and then convey this 
information to a centralized control location. Military applications include identification and tracking of enemy 
targets, detection of chemical and biological attacks, support of unmanned robotic vehicles, and counter-terrorism. 
Finally, wireless networks enable distributed control systems, with remote devices, sensors, and actuators linked 
together via wireless communication channels. Such networks enable automated highways, mobile robots, and 
easily-reconfigurable industrial automation. 

The various applications described above arc all components of the wireless vision. So what, exactly, is 
wireless communications? There arc many different ways to segment this complex topic into different applications, 
systems, or coverage regions [37], Wireless applications include voice, Internet access, web browsing, paging and 
short messaging, subscriber information services, file transfer, video teleconferencing, entertainment, sensing, and 
distributed control. Systems include cellular telephone systems, wireless LANs, wide-area wireless data systems, 
satellite systems, and ad hoc wireless networks. Coverage regions include in-building, campus, city, regional, 
and global. The question of how best to characterize wireless communications along these various segments 
has resulted in considerable fragmentation in the industry, as evidenced by the many different wireless products, 
standards, and services being offered or proposed. One reason for this fragmentation is that different wireless 
applications have different requirements. Voice systems have relatively low data rate requirements (around 20 
Kbps) and can tolerate a fairly high probability of bit error (bit error rates, or BERs, of around 10 3 ), but the 
total delay must be less than around 30 msec or it becomes noticeable to the end user. On the other hand, data 
systems typically require much higher data rates (1-100 Mbps) and very small BERs (the target BER is 10 ~ 8 
and all bits received in error must be retransmitted) but do not have a fixed delay requirement. Real-time video 
systems have high data rate requirements coupled with the same delay constraints as voice systems, while paging 
and short messaging have very low data rate requirements and no delay constraints. These diverse requirements for 
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different applications make it difficult to build one wireless system that can efficiently satisfy all these requirements 
simultaneously. Wired networks typically integrate the diverse requirements of different using a single protocol. 
This integration requires that the most stringent requirements for all applications be met simultaneously. While this 
may be possible on some wired networks, with data rates on the order of Gbps and BERs on the order of 10 “ l2 , 
it is not possible on wireless networks, which have much lower data rates and higher BERs. For these reasons, at 
least in the near future, wireless systems will continue to be fragmented, with different protocols tailored to support 
the requirements of different applications. 

The exponential growth of cellular telephone use and wireless Internet access have led to great optimism about 
wireless technology in general. Obviously not all wireless applications will flourish. While many wireless systems 
and companies have enjoyed spectacular success, there have also been many failures along the way, including 
first generation wireless LANs, the Iridium satellite system, wide area data services such as Metricom, and fixed 
wireless access (wireless “cable”) to the home. Indeed, it is impossible to predict what wireless failures and 
triumphs lie on the horizon. Moreover, there must be sufficient flexibility and creativity among both engineers and 
regulators to allow for accidental successes. It is clear, however, that the current and emerging wireless systems of 
today coupled with the vision of applications that wireless can enable insure a bright future for wireless technology. 

1.3 Technical Issues 

Many technical challenges must be addressed to enable the wireless applications of the future. These challenges 
extend across all aspects of the system design. As wireless terminals add more features, these small devices must 
incorporate multiple modes of operation to support the different applications and media. Computers process voice, 
image, text, and video data, but breakthroughs in circuit design arc required to implement the same multimode 
operation in a cheap, lightweight, handheld device. Since consumers don’t want large batteries that frequently 
need recharging, transmission and signal processing in the portable terminal must consume minimal power. The 
signal processing required to support multimedia applications and networking functions can be power-intensive. 
Thus, wireless infrastructure -based networks, such as wireless LANs and cellular systems, place as much of the 
processing burden as possible on fixed sites with large power resources. The associated bottlenecks and single 
points-of-failure arc clearly undesirable for the overall system. Ad hoc wireless networks without infrastructure 
arc highly appealing for many applications due to their flexibility and robustness. For these networks all processing 
and control must be performed by the network nodes in a distributed fashion, making energy-efficiency challenging 
to achieve. Energy is a particularly critical resource in networks where nodes cannot recharge their batteries, for 
example in sensing applications. Network design to meet the application requirements under such hard energy 
constraints remains a big technological hurdle. The finite bandwidth and random variations of wireless channels 
also requires robust applications that degrade gracefully as network performance degrades. 

Design of wireless networks differs fundamentally from wired network design due to the nature of the wireless 
channel. This channel is an unpredictable and difficult communications medium. First of all, the radio spectrum 
is a scarce resource that must be allocated to many different applications and systems. For this reason spectrum 
is controlled by regulatory bodies both regionally and globally. A regional or global system operating in a given 
frequency band must obey the restrictions for that band set forth by the corresponding regulatory body. Spectrum 
can also be very expensive since in many countries spectral licenses are often auctioned to the highest bidder. 
In the U.S. companies spent over nine billion dollars for second generation cellular licenses, and the auctions in 
Europe for third generation cellular spectrum garnered around 100 billion dollars. The spectrum obtained through 
these auctions must be used extremely efficiently to get a reasonable return on its investment, and it must also 
be reused over and over in the same geographical area, thus requiring cellular system designs with high capacity 
and good performance. At frequencies around several Gigahertz wireless radio components with reasonable size, 
power consumption, and cost arc available. However, the spectrum in this frequency range is extremely crowded. 
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Thus, technological breakthroughs to enable higher frequency systems with the same cost and performance would 
greatly reduce the spectrum shortage. However, path loss at these higher frequencies is larger, thereby limiting 
range, unless directional antennas are used. 

As a signal propagates through a wireless channel, it experiences random fluctuations in time if the transmitter, 
receiver, or surrounding objects arc moving, due to changing reflections and attenuation. Thus, the characteristics 
of the channel appeal - to change randomly with time, which makes it difficult to design reliable systems with 
guaranteed performance. Security is also more difficult to implement in wireless systems, since the airwaves are 
susceptible to snooping from anyone with an RF antenna. The analog cellular systems have no security, and one 
can easily listen in on conversations by scanning the analog cellular frequency band. All digital cellular systems 
implement some level of encryption. However, with enough knowledge, time and determination most of these 
encryption methods can be cracked and, indeed, several have been compromised. To support applications like 
electronic commerce and credit card transactions, the wireless network must be secure against such listeners. 

Wireless networking is also a significant challenge. The network must be able to locate a given user wherever 
it is among billions of globally-distributed mobile terminals. It must then route a call to that user as it moves at 
speeds of up to 100 Km/hr. The finite resources of the network must be allocated in a fair and efficient manner 
relative to changing user demands and locations. Moreover, there currently exists a tremendous infrastructure of 
wired networks: the telephone system, the Internet, and fiber optic cable, which should be used to connect wireless 
systems together into a global network. However, wireless systems with mobile users will never be able to compete 
with wired systems in terms of data rates and reliability. Interfacing between wireless and wired networks with 
vastly different performance capabilities is a difficult problem. 

Perhaps the most significant technical challenge in wireless network design is an overhaul of the design 
process itself. Wired networks are mostly designed according to a layered approach, whereby protocols associated 
with different layers of the system operation are designed in isolation, with baseline mechanisms to interface 
between layers. The layers in a wireless systems include the link or physical layer, which handles bit transmissions 
over the communications medium, the access layer, which handles shared access to the communications medium, 
the network and transport layers, which routes data across the network and insure end-to-end connectivity and data 
delivery, and the application layer, which dictates the end-to-end data rates and delay constraints associated with 
the application. While a layering methodology reduces complexity and facilitates modularity and standardization, 
it also leads to inefficiency and performance loss due to the lack of a global design optimization. The large 
capacity and good reliability of wired networks make these inefficiencies relatively benign for many wired network 
applications, although it does preclude good performance of delay-constrained applications such as voice and 
video. The situation is very different in a wireless network. Wireless links can exhibit very poor performance, 
and this performance along with user connectivity and network topology changes over time. In fact, the very 
notion of a wireless link is somewhat fuzzy due to the nature of radio propagation and broadcasting. The dynamic 
nature and poor performance of the underlying wireless communication channel indicates that high-performance 
networks must be optimized for this channel and must be robust and adaptive to its variations, as well as to network 
dynamics. Thus, these networks require integrated and adaptive protocols at all layers, from the link layer to the 
application layer. This cross-layer protocol design requires interdiciplinary expertise in communications, signal 
processing, and network theory and design. 

In the next section we give an overview of the wireless systems in operation today. It will be clear from 
this overview that the wireless vision remains a distant goal, with many technical challenges to overcome. These 
challenges will be examined in detail throughout the book. 
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1.4 Current Wireless Systems 



This section provides a brief overview of current wireless systems in operation today. The design details of these 
system arc constantly evolving, with new systems emerging and old ones going by the wayside. Thus, we will 
focus mainly on the high-level design aspects of the most common systems. More details on wireless system 
standards can be found in [1, 2, 3] A summary of the main wireless system standards is given in Appendix D. 

1.4.1 Cellular Telephone Systems 

Cellular telephone systems are extremely popular and lucrative worldwide: these are the systems that ignited the 
wireless revolution. Cellular systems provide two-way voice and data communication with regional, national, or 
international coverage. Cellular systems were initially designed for mobile terminals inside vehicles with antennas 
mounted on the vehicle roof. Today these systems have evolved to support lightweight handheld mobile terminals 
operating inside and outside buildings at both pedestrian and vehicle speeds. 

The basic premise behind cellular system design is frequency reuse, which exploits the fact that signal power 
falls off with distance to reuse the same frequency spectrum at spatially-separated locations. Specifically, the 
coverage area of a cellular system is divided into nonoverlapping cells where some set of channels is assigned 
to each cell. This same channel set is used in another cell some distance away, as shown in Figure 1.1, where 
Ci denotes the channel set used in a particular - cell. Operation within a cell is controlled by a centralized base 
station, as described in more detail below. The interference caused by users in different cells operating on the same 
channel set is called intercell interference. The spatial separation of cells that reuse the same channel set, the reuse 
distance, should be as small as possible so that frequencies are reused as often as possible, thereby maximizing 
spectral efficiency. However, as the reuse distance decreases, intercell interference increases, due to the smaller 
propagation distance between interfering cells. Since intercell interference must remain below a given threshold 
for acceptable system performance, reuse distance cannot be reduced below some minimum value. In practice it 
is quite difficult to determine this minimum value since both the transmitting and interfering signals experience 
random power variations due to the characteristics of wireless signal propagation. In order to determine the best 
reuse distance and base station placement, an accurate characterization of signal propagation within the cells is 
needed. 

Initial cellular system designs were mainly driven by the high cost of base stations, approximately one million 
dollars apiece. For this reason early cellular systems used a relatively small number of cells to cover an entire city 
or region. The cell base stations were placed on tall buildings or mountains and transmitted at very high power with 
cell coverage areas of several square miles. These large cells are called macrocells. Signal power was radiated 
uniformly in all directions, so a mobile moving in a circle around the base station would have approximately 
constant received power if the signal was not blocked by an attenuating object. This circular contour of constant 
power yields a hexagonal cell shape for the system, since a hexagon is the closest shape to a circle that can cover a 
given area with multiple nonoverlapping cells. 

Cellular systems in urban areas now mostly use smaller cells with base stations close to street level transmit- 
ting at much lower power. These smaller cells are called microcells or picocells, depending on their size. This 
evolution to smaller cells occured for two reasons: the need for higher capacity in areas with high user density and 
the reduced size and cost of base station electronics. A cell of any size can support roughly the same number of 
users if the system is scaled accordingly. Thus, for a given coverage area a system with many microcells has a 
higher number of users per unit area than a system with just a few macrocells. In addition, less power is required 
at the mobile terminals in microcellular systems, since the terminals are closer to the base stations. However, the 
evolution to smaller cells has complicated network design. Mobiles traverse a small cell more quickly than a large 
cell, and therefore handoffs must be processed more quickly. In addition, location management becomes more 
complicated, since there are more cells within a given area where a mobile may be located. It is also harder to 
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Figure 1.1: Cellular Systems. 



develop general propagation models for small cells, since signal propagation in these cells is highly dependent 
on base station placement and the geometry of the surrounding reflectors. In particular, a hexagonal cell shape is 
generally not a good approximation to signal propagation in microcells. Microcellular systems arc often designed 
using square or triangular cell shapes, but these shapes have a large margin of error in their approximation to 
microcell signal propagation [9]. 

All base stations in a given geographical area are connected via a high-speed communications link to a mobile 
telephone switching office (MTSO), as shown in Figure 1.2. The MTSO acts as a central controller for the network, 
allocating channels within each cell, coordinating handoffs between cells when a mobile traverses a cell boundary, 
and routing calls to and from mobile users. The MTSO can route voice calls through the public switched telephone 
network (PSTN) or provide Internet access. A new user located in a given cell requests a channel by sending a call 
request to the cell’s base station over a separate control channel. The request is relayed to the MTSO, which accepts 
the call request if a channel is available in that cell. If no channels arc available then the call request is rejected. A 
call handoff is initiated when the base station or the mobile in a given cell detects that the received signal power for 
that call is approaching a given minimum threshold. In this case the base station informs the MTSO that the mobile 
requires a handoff, and the MTSO then queries surrounding base stations to determine if one of these stations can 
detect that mobile’s signal. If so then the MTSO coordinates a handoff between the original base station and the 
new base station. If no channels arc available in the cell with the new base station then the handoff fails and the 
call is terminated. A call will also be dropped if the signal strength between a mobile and its base station drops 
below the minimum threshold needed for communication due to random signal variations. 

The first generation of cellular systems used analog communications, since they were primarily designed in 
the 1960's, before digital communications became prevalent. Second generation systems moved from analog to 
digital due to its many advantages. The components arc cheaper, faster, smaller, and require less power. Voice 
quality is improved due to error correction coding. Digital systems also have higher capacity than analog systems 
since they can use more spectrally-efficient digital modulation and more efficient techniques to share the cellular 
spectrum. They can also take advantage of advanced compression techniques and voice activity factors. In addition. 
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Figure 1.2: Current Cellular Network Architecture 



encryption techniques can be used to secure digital signals against eavesdropping. Digital systems can also offer 
data services in addition to voice, including short messaging, email, Internet access, and imaging capabilities 
(camera phones). Due to their lower cost and higher efficiency, service providers used aggressive pricing tactics to 
encourage user migration from analog to digital systems, and today analog systems arc primarily used in areas with 
no digital service. However, digital systems do not always work as well as the analog ones. Users can experience 
poor voice quality, frequent call dropping, and spotty coverage in certain areas. System performance has certainly 
improved as the technology and networks mature. In some areas cellular phones provide almost the same quality as 
landline service. Indeed, some people have replaced their wireline telephone service inside the home with cellular 
service. 

Spectral sharing in communication systems, also called multiple access, is done by dividing the signaling 
dimensions along the time, frequency, and/or code space axes. In frequency-division multiple access (FDMA) the 
total system bandwidth is divided into orthogonal frequency channels. In time-division multiple access (TDMA) 
time is divided orthogonally and each channel occupies the entire frequency band over its assigned timeslot. TDMA 
is more difficult to implement than FDMA since the users must be time-synchronized. However, it is easier to ac- 
commodate multiple data rates with TDMA since multiple timeslots can be assigned to a given user. Code-division 
multiple access (CDMA) is typically implemented using direct-sequence or frequency-hopping spread spectrum 
with either orthogonal or non-orthogonal codes. In direct-sequence each user modulates its data sequence by a 
different chip sequence which is much faster than the data sequence. In the frequency domain, the narrowband 
data signal is convolved with the wideband chip signal, resulting in a signal with a much wider bandwidth than 
the original data signal. In frequency-hopping the carrier frequency used to modulate the narrowband data signal 
is varied by a chip sequence which may be faster or slower than the data sequence. This results in a modulated 
signal that hops over different carrier frequencies. Typically spread spectrum signals are superimposed onto each 
other within the same signal bandwidth. A spread spectrum receiver separates out each of the distinct signals by 
separately decoding each spreading sequence. However, for non-orthogonal codes users within a cell interfere 
with each other (intracell interference) and codes that arc reused in other cells cause intercell interference. Both 
the intracell and intercell interference power is reduced by the spreading gain of the code. Moreover, interference 
in spread spectrum systems can be further reduced through multiuser detection and interference cancellation. More 
details on these different techniques for spectrum sharing and their performance analysis will be given in Chap- 
ters 13-14. The design tradeoffs associated with spectrum sharing arc very complex, and the decision of which 
technique is best for a given system and operating environment is never straightforward. 

Efficient cellular system designs arc interference-limited, i.e. the interference dominates the noise floor since 
otherwise more users could be added to the system. As a result, any technique to reduce interference in cellular 
systems leads directly to an increase in system capacity and performance. Some methods for interference reduction 
in use today or proposed for future systems include cell sectorization, directional and smart antennas, multiuser 
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detection, and dynamic resource allocation. Details of these techniques will be given in Chapter 15. 

The first generation (1G) cellular systems in the U.S., called the Advance Mobile Phone Service (AMPS), 
used FDMA with 30 KHz FM-modulated voice channels. The FCC initially allocated 40 MHz of spectrum to 
this system, which was increased to 50 MHz shortly after service introduction to support more users. This total 
bandwidth was divided into two 25 MHz bands, one for mobile-to-base station channels and the other for base 
station-to-mobile channels. The FCC divided these channels into two sets that were assigned to two different ser- 
vice providers in each city to encourage competition. A similar system, the European Total Access Communication 
System (ETACS), emerged in Europe. AMPS was deployed worldwide in the 1980's and remains the only cellular 
service in some of these areas, including some rural parts of the U.S. 

Many of the first generation cellular systems in Europe were incompatible, and the Europeans quickly con- 
verged on a uniform standard for second generation (2G) digital systems called GSM 1 . The GSM standard uses 
a combination of TDMA and slow frequency hopping with frequency-shift keying for the voice modulation. In 
contrast, the standards activities in the U.S. surrounding the second generation of digital cellular provoked a rag- 
ing debate on spectrum sharing techniques, resulting in several incompatible standards [10, 11, 12]. In particular, 
there are two standards in the 900 MHz cellular frequency band: IS-54, which uses a combination of TDMA and 
FDMA and phase-shift keyed modulation, and IS -95, which uses direct-sequence CDMA with binary modulation 
and coding [13, 14]. The spectrum for digital cellular in the 2 GHz PCS frequency band was auctioned off, so 
service providers could use an existing standard or develop proprietary systems for their purchased spectrum. The 
end result has been three different digital cellular standards for this frequency band: IS- 136 (which is basically 
the same as IS-54 at a higher frequency), IS-95, and the European GSM standard. The digital cellular standard 
in Japan is similar to IS-54 and IS- 136 but in a different frequency band, and the GSM system in Europe is at a 
different frequency than the GSM systems in the U.S. This proliferation of incompatible standards in the U.S. and 
internationally makes it impossible to roam between systems nationwide or globally without a multi-mode phone 
and/or multiple phones (and phone numbers). 

All of the second generation digital cellular standards have been enhanced to support high rate packet data 
services [15]. GSM systems provide data rates of up to 100 Kbps by aggregating all timeslots together for a single 
user. This enhancement is called GPRS. A more fundamental enhancement. Enhanced Data Services for GSM 
Evolution (EDGE), further increases data rates using a high-level modulation format combined with FEC coding. 
This modulation is more sensitive to fading effects, and EDGE uses adaptive techniques to mitigate this problem. 
Specifically, EDGE defines six different modulation and coding combinations, each optimized to a different value 
of received SNR. The received SNR is measured at the receiver and fed back to the transmitter, and the best 
modulation and coding combination for this SNR value is used. The IS-54 and IS- 136 systems currently provide 
data rates of 40-60 Kbps by aggregating time slots and using high-level modulation. This evolution of the IS- 136 
standard is called IS-136HS (high-speed). The IS-95 systems support higher data using a time-division technique 
called high data rate (HDR)[16]. 

The third generation (3G) cellular systems are based on a wideband CDMA standard developed within the 
auspices of the International Telecommunications Union (ITU) [15]. The standard, initially called International 
Mobile Telecommunications 2000 (IMT-2000), provides different data rates depending on mobility and location, 
from 384 Kbps for pedestrian use to 144 Kbps for vehicular use to 2 Mbps for indoor office use. The 3G standard 
is incompatible with 2G systems, so service providers must invest in a new infrastructure before they can provide 
3G service. The first 3G systems were deployed in Japan. One reason that 3G services came out first in Japan 
is the process of 3G spectrum allocation, which in Japan was awarded without much up-front cost. The 3G 
spectrum in both Europe and the U.S. is allocated based on auctioning, thereby requiring a huge initial investment 
for any company wishing to provide 3G service. European companies collectively paid over 100 billion dollars 

'The acronym GSM originally stood for Groupe Speciale Mobile, the name of the European charter establishing the GSM standard. As 
GSM systems proliferated around the world, the underlying acronym meaning was changed to Global Systems for Mobile Communications. 
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in their 3G spectrum auctions. There has been much controversy over the 3G auction process in Europe, with 
companies charging that the nature of the auctions caused enormous overbidding and that it will be very difficult 
if not impossible to reap a profit on this spectrum. A few of the companies have already decided to write off their 
investment in 3G spectrum and not pursue system buildout. In fact 3G systems have not grown as anticipated 
in Europe, and it appears that data enhancements to 2G systems may suffice to satisfy user demands. However, 
the 2G spectrum in Europe is severely overcrowded, so users will either eventually migrate to 3G or regulations 
will change so that 3G bandwidth can be used for 2G services (which is not currently allowed in Europe). 3G 
development in the U.S. has lagged far behind that of Europe. The available 3G spectrum in the U.S. is only about 
half that available in Europe. Due to wrangling about which parts of the spectrum will be used, the 3G spectral 
auctions in the U.S. have not yet taken place. However, the U.S. does allow the 1G and 2G spectrum to be used for 
3G, and this flexibility may allow a more gradual rollout and investment than the more restrictive 3G requirements 
in Europe. It appeal's that delaying 3G in the U.S. will allow U.S. service providers to learn from the mistakes and 
successes in Europe and Japan. 

1.4.2 Cordless Phones 

Cordless telephones first appeared in the late 1970’s and have experienced spectacular growth ever since. Many 
U.S. homes today have only cordless phones, which can be a safety risk since these phones don’t work in a power 
outage, in contrast to their wired counterparts. Cordless phones were originally designed to provide a low-cost 
low-mobility wireless connection to the PSTN, i.e. a short wireless link to replace the cord connecting a telephone 
base unit and its handset. Since cordless phones compete with wired handsets, their voice quality must be si mi lar. 
Initial cordless phones had poor voice quality and were quickly discarded by users. The first cordless systems 
allowed only one phone handset to connect to each base unit, and coverage was limited to a few rooms of a house 
or office. This is still the main premise behind cordless telephones in the U.S. today, although some base units now 
support multiple handsets and coverage has improved. In Europe and Asia digital cordless phone systems have 
evolved to provide coverage over much wider areas, both in and away from home, and arc similar in many ways to 
cellular telephone systems. 

The base units of cordless phones connect to the PSTN in the exact same manner as a landline phone, and 
thus they impose no added complexity on the telephone network. The movement of these cordless handsets is 
extremely limited: a handset must remain within range of its base unit. There is no coordination with other 
cordless phone systems, so a high density of these systems in a small area, e.g. an apartment building, can result in 
significant interference between systems. For this reason cordless phones today have multiple voice channels and 
scan between these channels to find the one with minimal interference. Many cordless phones use spread spectrum 
techniques to reduce interference from other cordless phone systems and from other systems like baby monitors 
and wireless LANs. 

In Europe and Asia the second generation of digital cordless phones (CT-2, for cordless telephone, second 
generation) have an extended range of use beyond a single residence or office. Within a home these systems operate 
as conventional cordless phones. To extend the range beyond the home base stations, also called phone-points or 
telepoints, arc mounted in places where people congregate, like shopping malls, busy streets, train stations, and 
airports. Cordless phones registered with the telepoint provider can place calls whenever they arc in range of a 
telepoint. Calls cannot be received from the telepoint since the network has no routing support for mobile users, 
although some CT-2 handsets have built-in pagers to compensate for this deficiency. These systems also do not 
handoff calls if a user moves between different telepoints, so a user must remain within range of the telepoint where 
his call was initiated for the duration of the call. Telepoint service was introduced twice in the United Kingdom and 
failed both times, but these systems grew rapidly in Hong Kong and Singapore through the mid 1990’s. This rapid 
growth deteriorated quickly after the first few years, as cellular phone operators cut prices to compete with telepoint 
service. The main complaint about telepoint service was the incomplete radio coverage and lack of handoff. Since 
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cellular systems avoid these problems, as long as prices were competitive there was little reason for people to use 
telepoint services. Most of these services have now disappeared. 

Another evolution of the cordless telephone designed primarily for office buildings is the European DECT 
system. The main function of DECT is to provide local mobility support for users in an in-building private branch 
exchange (PBX). In DECT systems base units arc mounted throughout a building, and each base station is attached 
through a controller to the PBX of the building. Handsets communicate to the nearest base station in the building, 
and calls arc handed off as a user walks between base stations. DECT can also ring handsets from the closest 
base station. The DECT standard also supports telepoint services, although this application has not received much 
attention, probably due to the failure of CT-2 services. There arc currently around 7 million DECT users in Europe, 
but the standard has not yet spread to other countries. 

A more advanced cordless telephone system that emerged in Japan is the Personal Handyphone System (PHS). 
The PHS system is quite si mi lar to a cellular system, with widespread base station deployment supporting handoff 
and call routing between base stations. With these capabilities PHS does not suffer from the main limitations of 
the CT-2 system. Initially PHS systems enjoyed one of the fastest growth rates ever for a new technology. In 1997, 
two years after its introduction, PHS subscribers peaked at about 7 million users, but its popularity then started to 
decline due to sharp price cutting by cellular providers. In 2005 there were about 4 million subscribers, attracted 
by the flat-rate service and relatively high speeds (128 Kbps) for data. PHS operators arc trying to push data rates 
up to 1 Mbps, which cellular providers cannot compete with. The main difference between a PHS system and a 
cellular system is that PHS cannot support call handoff at vehicle speeds. This deficiency is mainly due to the 
dynamic channel allocation procedure used in PHS. Dynamic channel allocation greatly increases the number of 
handsets that can be serviced by a single base station and their corresponding data rates, thereby lowering the 
system cost, but it also complicates the handoff procedure. Given the sustained popularity of PHS, it is unlikely 
to go the same route as CT-2 any time soon, especially if much higher data rates become available. However, it is 
cleai - from the recent history of cordless phone systems that to extend the range of these systems beyond the home 
requires either similar or better functionality than cellular systems or a significantly reduced cost. 

1.4.3 Wireless LANs 

Wireless LANs provide high-speed data within a small region, e.g. a campus or small building, as users move from 
place to place. Wireless devices that access these LANs arc typically stationary or moving at pedestrian speeds. 
All wireless LAN standards in the U.S. operate in unlicensed frequency bands. The primary unlicensed bands 
arc the ISM bands at 900 MHz, 2.4 GHz, and 5.8 GHz, and the Unlicensed National Information Infrastructure 
(U-NII) band at 5 GHz. In the ISM bands unlicensed users arc secondary users so must cope with interference 
from primary users when such users arc active. There arc no primary users in the U-NII band. An FCC license is 
not required to operate in either the ISM or U-NII bands. However, this advantage is a double-edged sword, since 
other unlicensed systems operate in these bands for the same reason, which can cause a great deal of interference 
between systems. The interference problem is mitigated by setting a limit on the power per unit bandwidth for 
unlicensed systems. Wireless LANs can have either a star architecture, with wireless access points or hubs placed 
throughout the coverage region, or a peer-to-peer architecture, where the wireless terminals self-configure into a 
network. 

Dozens of wireless LAN companies and products appeared in the early 1990's to capitalize on the “pent- 
up demand” for high-speed wireless data. These first generation wireless LANs were based on proprietary and 
incompatible protocols. Most operated within the 26 MHz spectrum of the 900 MHz ISM band using direct 
sequence spread spectrum, with data rates on the order of 1-2 Mbps. Both star and peer-to-peer architectures were 
used. The lack of standardization for these products led to high development costs, low-volume production, and 
small markets for each individual product. Of these original products only a handful were even mildly successful. 
Only one of the first generation wireless LANs, Motorola’s Altair, operated outside the 900 MHz band. This 
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system, operating in the licensed 18 GHz band, had data rates on the order of 6 Mbps. However, performance 
of Altair was hampered by the high cost of components and the increased path loss at 18 GHz, and Altair was 
discontinued within a few years of its release. 

The second generation of wireless LANs in the U.S. operate with 80 MHz of spectrum in the 2.4 GHz ISM 
band. A wireless LAN standard for this frequency band, the IEEE 802.11b standard, was developed to avoid 
some of the problems with the proprietary first generation systems. The standard specifies direct sequence spread 
spectrum with data rates of around 1.6 Mbps (raw data rates of 11 Mbps) and a range of approximately 150 
m. The network architecture can be either star or peer-to-peer, although the peer-to-peer feature is rarely used. 
Many companies developed products based on the 802.11b standard, and after slow initial growth the popularity 
of 802. lib wireless LANs has expanded considerably. Many laptops come with integrated 802. 1 lb wireless LAN 
cards. Companies and universities have installed 802.1 lb base stations throughout their locations, and many coffee 
houses, airports, and hotels offer wireless access, often for free, to increase their appeal. 

Two additional standards in the 802. 1 1 family were developed to provide higher data rates than 802. 1 lb. The 
IEEE 802. 11a wireless LAN standard operates with 300 MHz of spectrum in the 5 GHz U-NII band. The 802. 1 la 
standard is based on multicarrier modulation and provides 20-70 Mbps data rates. Since 802.1 la has much more 
bandwidth and consequently many more channels than 802.11b, it can support more users at higher data rates. 
There was some initial concern that 802. 1 la systems would be significantly more expensive than 802. 1 lb systems, 
but in fact they quickly became quite competitive in price. The other standard, 802. 1 lg, also uses multicarrier 
modulation and can be used in either the 2.4 GHz and 5 GHz bands with speeds of up to 54 Mbps. Many wireless 
LAN cards and access points support all three standards to avoid incompatibilities. 

In Europe wireless LAN development revolves around the HIPERLAN (high performance radio LAN) stan- 
dards. The first HIPERLAN standard, HIPERLAN Type 1, is similar to the IEEE 802.1 la wireless LAN standard, 
with data rates of 20 Mbps at a range of 50 m. This system operates in a 5 GHz band similar to the U-NII band. Its 
network architecture is peer-to-peer. The next generation of HIPERLAN, HIPERLAN Type 2, is still under devel- 
opment, but the goal is to provide data rates on the order of 54 Mbps with a similar range, and also to support access 
to cellular, ATM, and IP networks. HIPERLAN Type 2 is also supposed to include support for Quality-of-Service 
(QoS), however it is not yet clear how and to what extent this will be done. 

1.4.4 Wide Area Wireless Data Services 

Wide area wireless data services provide wireless data to high-mobility users over a very large coverage area. In 
these systems a given geographical region is serviced by base stations mounted on towers, rooftops, or mountains. 
The base stations can be connected to a backbone wired network or form a multihop ad hoc wireless network. 

Initial wide area wireless data services had very low data rates, below 10 Kbps, which gradually increased 
to 20 Kbps. There were two main players providing this service: Motient and Bell South Mobile Data (formerly 
RAM Mobile Data). Metricom provided a similar service with a network architecture consisting of a large network 
of small inexpensive base stations with small coverage areas. The increased efficiency of the small coverage areas 
allowed for higher data rates in Metricom, 76 Kbps, than in the other wide-area wireless data systems. However, 
the high infrastructure cost for Metricom eventually forced it into bankruptcy, and the system was shut down. Some 
of the infrastructure was bought and is operating in a few areas as Ricochet. 

The cellular digital packet data (CDPD) system is a wide area wireless data service overlayed on the analog 
cellular telephone network. CDPD shares the FDMA voice channels of the analog systems, since many of these 
channels arc idle due to the growth of digital cellular. The CDPD service provides packet data transmission at 
rates of 19.2 Kbps, and is available throughout the U.S. However, since newer generations of cellular systems 
also provide data services, CDPD is mostly being replaced by these newer services. Thus, wide ara wireless data 
services have not been very successful, although emerging systems that offer broadband access may have more 
appeal. 
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1.4.5 Broadband Wireless Access 

Broadband wireless access provides high-rate wireless communications between a fixed access point and multiple 
terminals. These systems were initially proposed to support interactive video service to the home, but the appli- 
cation emphasis then shifted to providing high speed data access (tens of Mbps) to the Internet, the WWW, and 
to high speed data networks for both homes and businesses. In the U.S. two frequency bands were set aside for 
these systems: part of the 28 GHz spectrum for local distribution systems (local multipoint distribution systems 
or LMDS) and a band in the 2 GHz spectrum for metropolitan distribution systems (multichannel multipoint dis- 
tribution services or MMDS). LMDS represents a quick means for new service providers to enter the already stiff 
competition among wireless and wireline broadband service providers [1, Chapter 2.3]. MMDS is a television 
and telecommunication delivery system with transmission ranges of 30-50 Km [1, Chapter 11.11], MMDS has 
the capability to deliver over one hundred digital video TV channels along with telephony and access to emerging 
interactive services such as the Internet. MMDS will mainly compete with existing cable and satellite systems. 
Europe is developing a standard si mi lar to MMDS called Hiperaccess. 

WiMAX is an emerging broadband wireless technology based on the IEEE 802.16 standard [20, 21], The core 
802. 16 specification is a standard for broadband wireless access systems operating at radio frequencies between 10 
GHz and 66 GHz. Data rates of around 40 Mbps will be available for fixed users and 15 Mbps for mobile users, 
with a range of several kilometers. Many laptop and PDA manufacturers arc planning to incorporate WiMAX once 
it becomes available to satisfy demand for constant Internet access and email exchange from any location. WiMax 
will compete with wireless LANs, 3G cellular services, and possibly wireline services like cable and DSL. The 
ability of WiMax to challenge or supplant these systems will depend on its relative performance and cost, which 
remain to be seen. 

1.4.6 Paging Systems 

Paging systems broadcast a short paging message simultaneously from many tall base stations or satellites trans- 
mitting at very high power (hundreds of watts to kilowatts). Systems with terrestrial transmitters arc typically 
localized to a particular' geographic area, such as a city or metropolitan region, while geosynchronous satellite 
transmitters provide national or international coverage. In both types of systems no location management or rout- 
ing functions are needed, since the paging message is broadcast over the entire coverage area. The high complexity 
and power of the paging transmitters allows low-complexity, low-power, pocket paging receivers with a long usage 
time from small and lightweight batteries. In addition, the high transmit power allows paging signals to easily 
penetrate building walls. Paging service also costs less than cellular service, both for the initial device and for the 
monthly usage charge, although this price advantage has declined considerably in recent years as cellular prices 
dropped. The low cost, small and lightweight handsets, long battery life, and ability of paging devices to work 
almost anywhere indoors or outdoors are the main reasons for their appeal. 

Early radio paging systems were analog 1 bit messages signaling a user that someone was trying to reach him 
or her. These systems required callback over a landline telephone to obtain the phone number of the paging party. 
The system evolved to allow a short digital message, including a phone number and brief text, to be sent to the 
pagee as well. Radio paging systems were initially extremely successful, with a peak of 50 million subscribers in 
the U.S. alone. However, their popularity started to wane with the widespread penetration and competitive cost of 
cellular telephone systems. Eventually the competition from cellular phones forced paging systems to provide new 
capabilities. Some implemented “answer-back” capability, i.e. two-way communication. This required a major 
change in the pager design, since it needed to transmit signals in addition to receiving them, and the transmis- 
sion distances to a satellite or distance base station is very large. Paging companies also teamed up with palmtop 
computer makers to incorporate paging functions into these devices [5], Despite these developments, the mar- 
ket for paging devices has shrunk considerably, although there is still a niche market among doctors and other 
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professionals that must be reachable anywhere. 



1.4.7 Satellite Networks 

Commercial satellite systems are another major component of the wireless communications infrastructure [6, 7]. 
Geosynchronous systems include Inmarsat and OmniTRACS. The former is geared mainly for analog voice trans- 
mission from remote locations. For example, it is commonly used by journalists to provide live reporting from 
war zones. The first generation Inmarsat- A system was designed for large (lm parabolic dish antenna) and rather 
expensive terminals. Newer generations of Inmarsats use digital techniques to enable smaller, less expensive ter- 
minals, around the size of a briefcase. Qualcomm’s OmniTRACS provides two-way communications as well as 
location positioning. The system is used primarily for alphanumeric messaging and location tracking of trucking 
fleets. There arc several major difficulties in providing voice and data services over geosynchronous satellites. It 
takes a great deal of power to reach these satellites, so handsets are typically large and bulky. In addition, there 
is a large round-trip propagation delay: this delay is quite noticeable in two-way voice communication. Geosyn- 
chronous satellites also have fairly low data rates, less than 10 Kbps. For these reasons lower orbit LEO satellites 
were thought to be a better match for voice and data communications. 

LEO systems require approximately 30-80 satellites to provide global coverage, and plans for deploying 
such constellations were widespread in the late 1990’s. One of the most ambitious of these systems, the Iridium 
constellation, was launched at that time. However, the cost of these satellites, to build, launch, and maintain, is 
much higher than that of terrestrial base stations. Although these LEO systems can certainly complement terrestrial 
systems in low-population areas, and are also appealing to travelers desiring just one handset and phone number 
for global roaming, the growth and diminished cost of cellular prevented many ambitious plans for widespread 
LEO voice and data systems to materialize. Iridium was eventually forced into bankruptcy and disbanded, and 
most of the other systems were never launched. An exception to these failures was the Globalstar LEO system, 
which currently provides voice and data services over a wide coverage area at data rates under 10 Kbps. Some of 
the Iridium satellites are still operational as well. 

The most appealing use for satellite system is broadcasting of video and audio over large geographic regions. 
In the U.S. approximately 1 in 8 homes have direct broadcast satellite service, and satellite radio is emerging as a 
popular service as well. Similar audio and video satellite broadcasting services are widespread in Europe. Satellites 
are best tailored for broadcasting, since they cover a wide area and are not compromised by an initial propagation 
delay. Moreover, the cost of the system can be amortized over many years and many users, making the service 
quite competitive with terrestrial entertainment broadcasting systems. 

1.4.8 Low-Cost Low-Power Radios: Bluetooth and Zigbee 

As radios decrease their cost and power consumption, it becomes feasible to embed them in more types of electronic 
devices, which can be used to create smart homes, sensor networks, and other compelling applications. Two radios 
have emerged to support this trend: Bluetooth and Zigbee. 

Bluetooth 2 radios provide short range connections between wireless devices along with rudimentary network- 
ing capabilities. The Bluetooth standard is based on a tiny microchip incorporating a radio transceiver that is built 
into digital devices. The transceiver takes the place of a connecting cable for devices such as cell phones, laptop 
and palmtop computers, portable printers and projectors, and network access points. Bluetooth is mainly for short 
range communications, e.g. from a laptop to a nearby printer or from a cell phone to a wireless headset. Its normal 
range of operation is 10 m (at 1 mW transmit power), and this range can be increased to 100 m by increasing the 
transmit power to 100 mW. The system operates in the unlicensed 2.4 GHz frequency band, hence it can be used 

2 The Bluetooth standard is named after Harald I Bluetooth, the king of Denmark between 940 and 985 AD who united Denmark and 
Norway. Bluetooth proposes to unite devices via radio connections, hence the inspiration for its name. 
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worldwide without any licensing issues. The Bluetooth standard provides 1 asynchronous data channel at 723.2 
Kbps. In this mode, also known as Asynchronous Connection-Less, or ACL, there is a reverse channel with a data 
rate of 57.6 Kbps. The specification also allows up to three synchronous channels each at a rate of 64 Kbps. This 
mode, also known as Synchronous Connection Oriented or SCO, is mainly used for voice applications such as 
headsets, but can also be used for data. These different modes result in an aggregate bit rate of approximately 1 
Mbps. Routing of the asynchronous data is done via a packet switching protocol based on frequency hopping at 
1600 hops per second. There is also a circuit switching protocol for the synchronous data. 

Bluetooth uses frequency-hopping for multiple access with a carrier spacing of 1 MHz. Typically, up to 80 
different frequencies are used, for a total bandwidth of 80 MHz. At any given time, the bandwidth available is 
1 MHz, with a maximum of eight devices sharing the bandwidth. Different logical channels (different hopping 
sequences) can simultaneously share the same 80 MHz bandwidth. Collisions will occur when devices in different 
piconets, on different logical channels, happen to use the same hop frequency at the same time. As the number of 
piconets in an area increases, the number of collisions increases, and performance degrades. 

The Bluetooth standard was developed jointly by 3 Com, Ericsson, Intel, IBM, Lucent, Microsoft, Motorola, 
Nokia, and Toshiba. The standard has now been adopted by over 1300 manufacturers, and many consumer elec- 
tronic products incorporate Bluetooth, including wireless headsets for cell phones, wireless USB or RS232 con- 
nectors, wireless PCMCIA cards, and wireless settop boxes. 

The ZigBee 3 radio specification is designed for lower cost and power consumption than Bluetooth [5]. The 
specification is based on the IEEE 802.15.4 standard. The radio operates in the same ISM band as Bluetooth, and 
is capable of connecting 255 devices per network. The specification supports data rates of up to 250 Kbps at a 
range of up to 30 m. These data rates are slower than Bluetooth, but in exchange the radio consumes significantly 
less power with a larger transmission range. The goal of ZigBee is to provide radio operation for months or years 
without recharging, thereby targeting applications such as sensor networks and inventory tags. 

1.4.9 Ultrawideband Radios 

Ultrawideband (UWB) radios arc extremely wideband radios with very high potential data rates [18, 6]. The con- 
cept of ultrawideband communications actually originated with Marconi's spark gap transmitter, which occupied 
a very wide bandwidth. However, since only a single low-rate user could occupy the spectrum, wideband commu- 
nications was abandoned in favor of more efficient communication techniques. The renewed interest in wideband 
communications was spurred by the FCC’s decision in 2002 to allow operation of UWB devices as system under- 
layed beneath existing users over a 7 GHz range of frequencies. These systems can operate either at baseband or at 
a carrier frequency in the 3.6- 10. 1 GHz range. The underlay in theory interferes with all systems in that frequency 
range, including critical safety and military systems, unlicensed systems such as 802.11 wireless and Bluetooth, 
and cellular systems where operators paid billions of dollars for dedicated spectrum use. The FCC’s ruling was 
quite controversial given the vested interest in interference-free spectrum of these users. To minimize the impact of 
UWB on primary band users, the FCC put in place severe transmit power restrictions. This requires UWB devices 
to be within close proximity of their intended receiver. 

UWB radios come with unique advantages that have long been appreciated by the radar and communications 
communities. Their wideband nature allows UWB signals to easily penetrate through obstacles and provides very 
precise ranging capabilities. Moreover, the available UWB bandwidth has the potential for very high data rates. 
Finally, the power restrictions dictate that the devices can be small with low power consumption. 

Initial UWB systems used ultra-short pulses with simple amplitude or position modulation. Multipath can 
significantly degrade performance of such systems, and proposals to mitigate the effects of multipath include 

3 Zigbee takes its name front the dance that honey bees use to communicate information about new-found food sources to other members 
of the colony. 
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equalization and multicarrier modulation. Precise and rapid synchronization is also a big challenge for these 
systems. While many technical challenges remain, the appeal of UWB technology has sparked great interest both 
commercially and in the research community to address these issues. 



1.5 The Wireless Spectrum 

1.5.1 Methods for Spectrum Allocation 

Most countries have government agencies responsible for allocating and controlling the use of the radio spectrum. 
In the U.S. spectrum is allocated by the Federal Communications Commission (FCC) for commercial use and by 
the Office of Spectral Management (OSM) for military use. Commercial spectral allocation is governed in Europe 
by the European Telecommunications Standards Institute (ETSI) and globally by the International Telecommuni- 
cations Union (ITU). Governments decide how much spectrum to allocate between commercial and military use, 
and this decision is dynamic depending on need. Historically the FCC allocated spectral blocks for specific uses 
and assigned licenses to use these blocks to specific groups or companies. For example, in the 1980s the FCC 
allocated frequencies in the 800 MHz band for analog cellular phone service, and provided spectral licenses to two 
operators in each geographical area based on a number of criteria. While the FCC and regulatory bodies in other 
countries still allocate spectral blocks for specific purposes, these blocks are now commonly assigned through 
spectral auctions to the highest bidder. While some argue that this market-based method is the fairest way for 
governments to allocate the limited spectral resource, and it provides significant revenue to the government be- 
sides, there are others who believe that this mechanism stifles innovation, limits competition, and hurts technology 
adoption. Specifically, the high cost of spectrum dictates that only large companies or conglomerates can purchase 
it. Moreover, the large investment required to obtain spectrum can delay the ability to invest in infrastructure for 
system rollout and results in very high initial prices for the end user. The 3G spectral auctions in Europe, in which 
several companies ultimately defaulted, have provided fuel to the fire against spectral auctions. 

In addition to spectral auctions, spectrum can be set aside in specific frequency bands that arc free to use with 
a license according to a specific set of etiquette rules. The rules may correspond to a specific communications 
standard, power levels, etc. The purpose of these unlicensed bands is to encourage innovation and low-cost im- 
plementation. Many extremely successful wireless systems operate in unlicensed bands, including wireless LANs, 
Bluetooth, and cordless phones. A major difficulty of unlicensed bands is that they can be killed by their own 
success. If many unlicensed devices in the same band arc used in close proximity, they generate much interference 
to each other, which can make the band unusable. 

Underlay systems arc another alternative to allocate spectrum. An underlay system operates as a secondary 
user in a frequency band with other primary users. Operation of secondary users is typically restricted so that 
primary users experience minimal interference. This is usually accomplished by restricting the power/Hz of the 
secondary users. UWB is an example of an underlay system, as are unlicensed systems in the ISM frequency bands. 
Such underlay systems can be extremely controversial given the complexity of characterizing how interference 
affects the primary users. Yet the trend towards spectrum allocation for underlays appeal's to be accelerating, 
mainly due to the scarcity of available spectrum for new systems and applications. 

Satellite systems cover large areas spanning many countries and sometimes the globe. For wireless systems 
that span multiple countries, spectrum is allocated by the International Telecommunications Union Radio Commu- 
nications group (ITU-R). The standards arm of this body, ITU-T, adopts telecommunication standards for global 
systems that must interoperate with each other across national boundaries. 

There is some movement within regulatory bodies worldwide to change the way spectrum is allocated. Indeed, 
the basic mechanisms for spectral allocation have not changed much since the inception of the regulatory bodies in 
the early to mid 1900’s, although spectral auctions and underlay systems are relatively new. The goal of changing 
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spectrum allocation policy is to take advantage of the technological advances in radios to make spectrum allocation 
more efficient and flexible. One compelling idea is the notion of a smart or cognitive radio. This type of radio can 
sense its spectral environment to determine dimensions in time, space, and frequency where it would not cause 
interference to other users even at moderate to high transmit powers. If such radios could operate over a very wide 
frequency band, it would open up huge amounts of new bandwidth and tremendous opportunities for new wireless 
systems and applications. However, many technology and policy hurdles must be overcome to allow such a radical 
change in spectrum allocation. 

1.5.2 Spectrum Allocations for Existing Systems 

Most wireless applications reside in the radio spectrum between 30 MHz and 30 GHz. These frequencies are 
natural for wireless systems since they are not affected by the earth's curvature, require only moderately sized 
antennas, and can penetrate the ionosphere. Note that the required antenna size for good reception is inversely 
proportional to the square of signal frequency, so moving systems to a higher frequency allows for more compact 
antennas. However, received signal power with nondirectional antennas is proportional to the inverse of frequency 
squared, so it is harder to cover large distances with higher frequency signals. 

As discussed in the previous section, spectrum is allocated either in licensed bands (which regulatory bodies 
assign to specific operators) or in unlicensed bands (which can be used by any system subject to certain operational 
requirements). The following table shows the licensed spectrum allocated to major commercial wireless systems 
in the U.S. today. There arc similar allocations in Europe and Asia. 



AM Radio 


535-1605 KHz 


FM Radio 


88-108 MHz 


Broadcast TV (Channels 2-6) 


54-88 MHz 


Broadcast TV (Channels 7-13) 


174-216 MHz 


Broadcast TV (UHF) 


470-806 MHz 


3G Broadband Wireless 


746-764 MHz, 776-794 MHz 


3G Broadband Wireless 


1.7-1.85 MHz, 2.5-2.69 MHz 


1G and 2G Digital Cellular Phones 


806-902 MHz 


Personal Communications Service (2G Cell Phones) 


1.85-1.99 GHz 


Wireless Communications Service 


2.305-2.32 GHz, 2.345-2.36 GHz 


Satellite Digital Radio 


2.32-2.325 GHz 


Multichannel Multipoint Distribution Service (MMDS) 


2.15-2.68 GHz 


Digital Broadcast Satellite (Satellite TV) 


12.2-12.7 GHz 


Focal Multipoint Distribution Service (FMDS) 


27.5-29.5 GHz, 31-31.3 GHz 


Fixed Wireless Services 


38.6-40 GHz 



Note that digital TV is slated for the same bands as broadcast TV, so all broadcasters must eventually switch 
from analog to digital transmission. Also, the 3G broadband wireless spectrum is currently allocated to UHF TV 
stations 60-69, but is slated to be reallocated. Both 1G analog and 2G digital cellular services occupy the same 
cellular band at 800 MHz, and the cellular service providers decide how much of the band to allocate between 
digital and analog service. 

Unlicensed spectrum is allocated by the governing body within a given country. Often countries try to match 
their frequency allocation for unlicensed use so that technology developed for that spectrum is compatible world- 
wide. The following table shows the unlicensed spectrum allocations in the U.S. 
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ISM Band I (Cordless phones, 1G WLANs) 


902-928 MHz 


ISM Band II (Bluetooth, 802.1 lb WLANs) 


2.4-2.4835 GHz 


ISM Band III (Wireless PBX) 


5.725-5.85 GHz 


Nil Band I (Indoor systems, 802.1 la WLANs) 


5.15-5.25 GHz 


Nil Band II (short outdoor and campus applications) 


5.25-5.35 GHz 


Nil Band III (long outdoor and point-to-point links) 


5.725-5.825 GHz 



ISM Band I has licensed users transmitting at high power that interfere with the unlicensed users. Therefore, 
the requirements for unlicensed use of this band is highly restrictive and performance is somewhat poor. The U-NII 
bands have a total of 300 MHz of spectrum in three separate 100 MHz bands, with slightly different restrictions on 
each band. Many unlicensed systems operate in these bands. 

1.6 Standards 

Communication systems that interact with each other require standardization. Standards arc typically decided on 
by national or international committees: in the U.S. the TIA plays this role. These committees adopt standards 
that arc developed by other organizations. The IEEE is the major player for standards development in the United 
States, while ETSI plays this role in Europe. Both groups follow a lengthy process for standards development 
which entails input from companies and other interested part ies, and a long and detailed review process. The 
standards process is a large time investment, but companies participate since if they can incorporate their ideas 
into the standard, this gives them an advantage in developing the resulting system. In general standards do not 
include all the details on all aspects of the system design. This allows companies to innovate and differentiate their 
products from other standardized systems. The main goal of standardization is for systems to interoperate with 
other systems following the same standard. 

In addition to insuring interoperability, standards also enable economies of scale and pressure prices lower. 
For example, wireless LANs typically operate in the unlicensed spectral bands, so they are not required to follow 
a specific standard. The first generation of wireless LANs were not standardized, so specialized components 
were needed for many systems, leading to excessively high cost which, coupled with poor performance, led to 
very limited adoption. This experience led to a strong push to standardize the next wireless LAN generation, 
which resulted in the highly successful IEEE 802.1 1 family of standards. Future generations of wireless LANs are 
expected to be standardized, including the now emerging IEEE 802.1 la standard in the 5 GHz band. 

There are, of course, disadvantages to standardization. The standards process is not perfect, as company par- 
ticipants often have their own agenda which does not always coincide with the best technology or best interests of 
the consumers. In addition, the standards process must be completed at some point, after which time it becomes 
more difficult to add new innovations and improvements to an existing standard. Finally, the standards process can 
become very politicized. This happened with the second generation of cellular phones in the U.S., which ultimately 
led to the adoption of two different standards, a bit of an oxymoron. The resulting delays and technology split put 
the U.S. well behind Europe in the development of 2nd generation cellular systems. Despite its flaws, standard- 
ization is clearly a necessary and often beneficial component of wireless system design and operation. However, it 
would benefit everyone in the wireless technology industry if some of the problems in the standardization process 
could be mitigated. 
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Chapter 1 Problems 

1 . As storage capability increases, we can store larger and larger amounts of data on smaller and smaller storage 
devices. Indeed, we can envision microscopic computer chips storing terraflops of data. Suppose this data 
is to be transfered over some distance. Discuss the pros and cons of putting a large number of these storage 
devices in a truck and driving them to their destination rather than sending the data electronically. 

2. Describe two technical advantages and disadvantages of wireless systems that use bursty data transmission 
rather than continuous data transmission. 

3. Fiber optic cable typically exhibits a probability of bit error of Pi, = 10 -12 . A form of wireless modulation, 
DPSK, has Pb = 2 ^ in some wireless channels, where 7 is the average SNR. Find the average SNR required 
to achieve the same Pi, in the wireless channel as in the fiber optic cable. Due to this extremeley high required 
SNR, wireless channels typically have Pi, much larger than 10 -12 . 

4. Find the round-trip delay of data sent between a satellite and the earth for LEO, MEO, and GEO satellites 
assuming the speed of light is 3 x 10 s m/s. If the maximum acceptable delay for a voice system is 30 
milliseconds, which of these satellite systems would be acceptable for two-way voice communication? 

5. Figure 1.1 indicates a relatively flat growth for wireless data between 1995 and 2000. What applications 
might significantly increase the growth rate of wireless data users. 

6. This problem illustrates some of the economic issues facing service providers as they migrate away from 
voice-only systems to mixed-media systems. Suppose you arc a service provider with 120KHz of bandwidth 
which you must allocate between voice and data users. The voice users require 20Khz of bandwidth, and 
the data users require 60KHz of bandwidth. So, for example, you could allocate all of your bandwidth to 
voice users, resulting in 6 voice channels, or you could divide the bandwidth to have one data channel and 
three voice channels, etc. Suppose further that this is a time-division system, with timeslots of duration 
T. All voice and data call requests come in at the beginning of a timeslot and both types of calls last T 
seconds. There are six independent voice users in the system: each of these users requests a voice channel 
with probability .8 and pays $.20 if his call is processed. There arc two independent data users in the system: 
each of these users requests a data channel with probability .5 and pays $1 if his call is processed. How 
should you allocate your bandwidth to maximize your expected revenue? 

7. Describe three disadvantages of using a wireless LAN instead of a wired LAN. For what applications will 
these disadvantages be outweighed by the benefits of wireless mobility. For what applications will the 
disadvantages override the advantages. 

8. Cellular systems arc migrating to smaller cells to increase system capacity. Name at least three design issues 
which are complicated by this trend. 

9. Why does minimizing reuse distance maximize spectral efficiency of a cellular system? 

10. This problem demonstrates the capacity increase as cell size decreases. Consider a square city that is 100 
square kilometers. Suppose you design a cellular system for this city with square cells, where every cell 
(regardless of cell size) has 100 channels so can support 100 active users (in practice the number of users 
that can be supported per cell is mostly independent of cell size as long as the propagation model and power 
scale appropriately). 

(a) What is the total number of active users that your system can support for a cell size of 1 square kilo- 
meter? 
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(b) What cell size would you use if you require that your system support 250,000 active users? 

Now we consider some financial implications based on the fact that users do not talk continuously. Assume 
that Friday from 5-6 pm is the busiest hour for cell phone users. During this time, the average user places 
a single call, and this call lasts two minutes. Your system should be designed such that the subscribers will 
tolerate no greater than a two percent blocking probability during this peak hour (Blocking probability is 
computed using the Erlang B model: P 5 = {A C / C\) / A k /kl), where C is the number of channels 
and A = U fiH for U the number of users, fi the average number of call requests per unit time, and H the 
average duration of a call. See Section 3.6 of Rappaport, EE276 notes, or any basic networks book for more 
details). 

(c) How many total subscribers can be supported in the macrocell system (1 square Km cells) and in the 
microcell system (with cell size from paid (b))? 

(d) If a base station costs $500,000, what are the base station costs for each system? 

(e) If users pay 50 dollars a month in both systems, what will be the montly revenue in each case. How 
long will it take to recoup the infrastructure (base station) cost for each system? 

1 1 . How many CDPD data lines arc needed to achieve the same data rate as the average rate of Wi-Max? 
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Chapter 2 

Path Loss and Shadowing 



The wireless radio channel poses a severe challenge as a medium for reliable high-speed communication. It is not 
only susceptible to noise, interference, and other channel impediments, but these impediments change over time 
in unpredictable ways due to user movement. In this chapter we will characterize the variation in received signal 
power over distance due to path loss and shadowing. Path loss is caused by dissipation of the power radiated by the 
transmitter as well as effects of the propagation channel. Path loss models generally assume that path loss is the 
same at a given transmit-receive distance 1 . Shadowing is caused by obstacles between the transmitter and receiver 
that attenuate signal power through absorption, reflection, scattering, and diffraction. When the attenuation is very 
strong, the signal is blocked. Variation due to path loss occurs over very large distances (100-1000 meters), whereas 
variation due to shadowing occurs over distances proportional to the length of the obstructing object (10-100 meters 
in outdoor environments and less in indoor environments). Since variations due to path loss and shadowing occur 
over relatively large distances, this variation is sometimes refered to as large-scale propagation effects. Chapter 3 
will deal with variation due to the constructive and destructive addition of multipath signal components. Variation 
due to multipath occurs over very short distances, on the order of the signal wavelength, so these variations arc 
sometimes refered to as small-scale propagation effects. Figure 2. 1 illustrates the ratio of the received-to-transmit 
power in dB versus log-distance for the combined effects of path loss, shadowing, and multipath. 

After a brief introduction and description of our signal model, we present the simplest model for signal 
propagation: free space path loss. A signal propagating between two points with no attenuation or reflection 
follows the free space propagation law. We then describe ray tracing propagation models. These models arc used 
to approximate wave propagation according to Maxwell's equations, and are accurate models when the number 
of multipath components is small and the physical environment is known. Ray tracing models depend heavily 
on the geometry and dielectric properties of the region through which the signal propagates. We also described 
empirical models with parameters based on measurements for both indoor and outdoor channels. We also present 
a simple generic model with a few parameters that captures the primary impact of path loss in system analysis. A 
log-normal model for shadowing based on a large number of shadowing objects is also given. When the number 
of multipath components is large, or the geometry and dielectric properties of the propagation environment arc 
unknown, statistical models must be used. These statistical multipath models will be described in Chapter 3. 

While this chapter gives a brief overview of channel models for path loss and shadowing, comprehensive 
coverage of channel and propagation models at different frequencies of interest merits a book in its own right, and 
in fact there arc several excellent texts on this topic [3, 5]. Channel models for specialized systems, e.g. multiple 
antenna and ultrawideband systems, can be found in [65, 66]. 

'This assumes that the path loss model does not include shadowing effects 
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Path Loss Alone 




Figure 2.1: Path Loss, Shadowing and Multipath versus Distance. 



2.1 Radio Wave Propagation 

The initial understanding of radio wave propagation goes back to the pioneering work of James Clerk Maxwell, 
who in 1864 formulated the theory of electromagnetic propagation which predicted the existence of radio waves. In 
1887, the physical existence of these waves was demonstrated by Heinrich Hertz. However, Hertz saw no practical 
use for radio waves, reasoning that since audio frequencies were low, where propagation was poor, radio waves 
could never carry voice. The work of Maxwell and Hertz initiated the field of radio communications: in 1 894 Oliver 
Lodge used these principles to build the first wireless communication system, however its transmission distance 
was limited to 150 meters. By 1897 the entrepreneur Guglielmo Marconi had managed to send a radio signal from 
the Isle of Wight to a tugboat 18 miles away, and in 1901 Marconi’s wireless system could traverse the Atlantic 
ocean. These early systems used telegraph signals for communicating information. The first transmission of voice 
and music was done by Reginald Fessenden in 1906 using a form of amplitude modulation, which got around the 
propagation limitations at low frequencies observed by Hertz by translating signals to a higher frequency, as is 
done in all wireless systems today. 

Electromagnetic waves propagate through environments where they arc reflected, scattered, and diffracted 
by walls, terrain, buildings, and other objects. The ultimate details of this propagation can be obtained by solving 
Maxwell’s equations with boundary conditions that express the physical characteristics of these obstructing objects. 
This requires the calculation of the Radar Cross Section (RCS) of large and complex structures. Since these 
calculations are difficult, and many times the necessary parameters arc not available, approximations have been 
developed to characterize signal propagation without resorting to Maxwell’s equations. 

The most common approximations use ray-tracing techniques. These techniques approximate the propaga- 
tion of electromagnetic waves by representing the wavefronts as simple particles: the model determines the re- 
flection and refraction effects on the wavefront but ignores the more complex scattering phenomenon predicted by 
Maxwell’s coupled differential equations. The simplest ray-tracing model is the two-ray model, which accurately 
describes signal propagation when there is one direct path between the transmitter and receiver and one reflected 
path. The reflected path typically bounces off the ground, and the two-ray model is a good approximation for 
propagation along highways, rural roads, and over water. We next consider more complex models with additional 
reflected, scattered, or diffracted components. Many propagation environments are not accurately reflected with 
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ray tracing models. In these cases it is common to develop analytical models based on empirical measurements, 
and we will discuss several of the most common of these empirical models. 

Often the complexity and variability of the radio channel makes it difficult to obtain an accurate deterministic 
channel model. For these cases statistical models arc often used. The attenuation caused by signal path obstruc- 
tions such as buildings or other objects is typically characterized statistically, as described in Section 2.7. Statistical 
models are also used to characterize the constructive and destructive interference for a large number of multipath 
components, as described in Chapter 3. Statistical models arc most accurate in environments with fairly regular 
geometries and uniform dielectric properties. Indoor environments tend to be less regular than outdoor environ- 
ments, since the geometric and dielectric characteristics change dramatically depending on whether the indoor 
environment is an open factory, cubicled office, or metal machine shop. For these environments computer-aided 
modeling tools arc available to predict signal propagation characteristics [1], 

2.2 Transmit and Receive Signal Models 

Our models are developed mainly for signals in the UHF and SHF bands, from .3-3 GHz and 3-30 GHz, respec- 
tively. This range of frequencies is quite favorable for wireless system operation due to its propagation charac- 
teristics and relatively small required antenna size. We assume the transmission distances on the earth arc small 
enough so as not to be affected by the earth's curvature. 

All transmitted and received signals we consider arc real. That is because modulators arc built using oscillators 
that generate real sinusoids (not complex exponentials). While we model communication channels using a complex 
frequency response for analytical simplicity, in fact the channel just introduces an amplitude and phase change at 
each frequency of the transmitted signal so that the received signal is also real. Real modulated and demodulated 
signals arc often represented as the real part of a complex signal to facilitate analysis. This model gives rise to 
the complex baseband representation of bandpass signals, which we use for our transmitted and received signals. 
More details on the complex baseband representation for bandpass signals and systems can be found in Appendix 
A. 

We model the transmitted signal as 

s(t) = 9f ft{«(i)e J ' 27r/ct } 

= 3ft {u(t)} cos(27r/ c f) — S3 {u(t)} sin{2n f c t) 

= x(t) cos(27r/ c f) — y(t) sin(27r/ c t), (2.1) 

where u(t) = x(t) + jy(t) is a complex baseband signal with in-phase component x(t) = 3ft { «(/:) }, quadrature 
component y(t ) = A {u(t)}, bandwidth B u , and power P u . The signal u(t) is called the complex envelope or 
complex lowpass equivalent signal of s(t). We call u(t) the complex envelope of s(t) since the magnitude of ait) 
is the magnitude of s(t) and the phase of u(t) is the phase of s(t). This phase includes any carrier phase offset. 
This is a standard representation for bandpass signals with bandwidth B « f c , as it allows signal manipulation 
via u(t) irrespective of the carrier frequency. The power in the transmitted signal s(t) is Pt = P u / 2. 

The received signal will have a similar form: 

r(t) = 3ft |n(t)e- ?27r ^ ct | , (2.2) 

where the complex baseband signal v(t) will depend on the channel through which s(t) propagates. In particular, 
as discussed in Appendix A, if s(t) is transmitted through a time-invariant channel then v(t) = u(t) * c(t), where 
c(t) is the equivalent lowpass channel impulse response for the channel. Time-varying channels will be treated in 
Chapter 3. The received signal may have a Doppler shift of //) = v cos 9/ A associated with it, where 9 is the arrival 
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angle of the received signal relative to the direction of motion, v is the receiver velocity towards the transmitter 
in the direction of motion, and A = c// c is the signal wavelength (c = 3 x 10 8 m/s is the speed of light). The 
geometry associated with the Doppler shift is shown in Fig. 2.2. The Doppler shift results from the fact that 
transmitter or receiver movement over a short time interval At causes a slight change in distance Ad = vAt cos 6 
that the transmitted signal needs to travel to the receiver. The phase change due to this path length difference is 
A (j) = 2tt(;A/ cos 0/X. The Doppler frequency is then obtained from the relationship between signal frequency 
and phase: 



Id 



1 Act) 

2tx At 



vcos 0/ A. 



(2.3) 



If the receiver is moving towards the transmitter, i.e. — 7r/2 < 0 < tt/2, then the Doppler frequency is positive, 
otherwise it is negative. We will ignore the Doppler term in the free-space and ray tracing models of this chapter, 
since for typical vehicle speeds (75 Km/hr) and frequencies (around 1 GHz), it is on the order of 100 Hz [2]. 
However, we will include Doppler effects in Chapter 3 on statistical fading models. 



Transmitted 

Sigmal 




At 



Figure 2.2: Geometry Associated with Doppler Shift. 

Suppose s(t) of power P/ is transmitted through a given channel, with corresponding received signal r(t) of 
power P r , where P r is averaged over any random variations due to shadowing. We define the linear path loss of 
the channel as the ratio of transmit power to receive power: 



Pl = (2.4) 

r r 

We define the path loss of the channel as the dB value of the linear path loss or, equivalently, the difference in dB 
between the transmitted and received signal power: 

P L dB = 101og 10 ^dB. (2.5) 

r r 

In general the dB path loss is a nonnegative number since the channel does not contain active elements, and thus 
can only attenuate the signal. The dB path gain is defined as the negative of the dB path loss: Pq = —Pl = 
10 log 1 0 ( P r / Pi ) dB, which is generally a negative number. With shadowing the received power will include the 
effects of path loss and an additional random component due to blockage from objects, as we discuss in Section 2.7. 
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2.3 Free-Space Path Loss 



Consider a signal transmitted through free space to a receiver located at distance d from the transmitter. Assume 
there arc no obstructions between the transmitter and receiver and the signal propagates along a straight line 
between the two. The channel model associated with this transmission is called a line-of-sight (LOS) channel, and 
the corresponding received signal is called the LOS signal or ray. Free-space path loss introduces a complex scale 
factor [3], resulting in the received signal 



r(t) = 4? 



\^Gie~^ d / x 



And 



u(t)e- 



.jlnfct 



(2.6) 



where \/Gi is the product of the transmit and receive antenna field radiation patterns in the LOS direction. The 
phase shift e -pnd/\ is due to the distance d the wave travels. 

The power in the transmitted signal s(t) is P t , so the ratio of received to transmitted power from (2.6) is 



Pr 

Pt 



' VGj\ - 

And 



(2.7) 



Thus, the received signal power falls off inversely proportional to the square of the distance d between the transmit 
and receive antennas. We will see in the next section that for other signal propagation models, the received signal 
power falls off more quickly relative to this distance. The received signal power is also proportional to the square 
of the signal wavelength, so as the carrier frequency increases, the received power decreases. This dependence of 
received power on the signal wavelength A is due to the effective area of the receive antenna [3]. However, direc- 
tional antennas can be designed so that receive power is an increasing function of frequency for highly directional 
links [4]. The received power can be expressed in dBm as 



P, dBm = P t dBm + 101og 10 (G z ) + 201og 10 (A) - 201og 10 (47r) - 201og 10 (d). (2.8) 



Free-space path loss is defined as the path loss of the free-space model: 

r 1 dB = ioio glo h = — ioi„ glo AL 



The free-space path gain is thus 



Pg = ~Pl = 101og 10 



G t X 2 

(And) 2 



(2.9) 

(2.10) 



Example 2.1: Consider an indoor wireless LAN with f c = 900 MHz, cells of radius 100 m, and nondirectional 
antennas. Under the free-space path loss model, what transmit power is required at the access point such that all ter- 
minals within the cell receive a minimum power of 10 /rW. How does this change if the system frequency is 5 GHz? 



Solution: We must find the transmit power such that the terminals at the cell boundary receive the minimum 
required power. We obtain a formula for the required transmit power by inverting (2.7) to obtain: 



Pt =Pr 



And 



iVGiXl 

Substituting in Gi = 1 (nondirectional antennas), A = c// c = .33 m, d = 10 m, and P r = 10//W yields P t = 
1.45W = 1.61 dBW (Recall that P Watts equals 101og 10 [T 5 ] dbW, dB relative to one Watt, and 101og 10 [P/.001] 
dBm, dB relative to one milliwatt). At 5 GHz only A = .06 changes, so Pt = 43.9 KW = 16.42 dBW. 
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2.4 Ray Tracing 



In a typical urban or indoor environment, a radio signal transmitted from a fixed source will encounter multiple 
objects in the environment that produce reflected, diffracted, or scattered copies of the transmitted signal, as shown 
in Figure 2.3. These additional copies of the transmitted signal, called multipath signal components, can be atten- 
uated in power, delayed in time, and shifted in phase and/or frequency from the LOS signal path at the receiver. 
The multipath and transmitted signal are summed together at the receiver, which often produces distortion in the 
received signal relative to the transmitted signal. 




Figure 2.3: Reflected, Diffracted, and Scattered Wave Components 

In ray tracing we assume a finite number of reflectors with known location and dielectric properties. The 
details of the multipath propagation can then be solved using Maxwell’s equations with appropriate boundary 
conditions. However, the computational complexity of this solution makes it impractical as a general modeling tool. 
Ray tracing techniques approximate the propagation of electromagnetic waves by representing the wavefronts as 
simple particles. Thus, the reflection, diffraction, and scattering effects on the wavefront arc approximated using 
simple geometric equations instead of Maxwell’s more complex wave equations. The error of the ray tracing 
approximation is smallest when the receiver is many wavelengths from the nearest scatterer, and all the scatterers 
are large relative to a wavelength and fairly smooth. Comparison of the ray tracing method with empirical data 
shows it to accurately model received signal power in rural areas [10], along city streets where both the transmitter 
and receiver arc close to the ground [8, 7, 10], or in indoor environments with appropriately adjusted diffraction 
coefficients [9]. Propagation effects besides received power variations, such as the delay spread of the multipath, 
are not always well-captured with ray tracing techniques [11]. 

If the transmitter, receiver, and reflectors are all immobile then the impact of the multiple received signal 
paths, and their delays relative to the LOS path, are fixed. However, if the source or receiver arc moving, then 
the characteristics of the multiple paths vary with time. These time variations arc deterministic when the number, 
location, and characteristics of the reflectors arc known over time. Otherwise, statistical models must be used. 
Similarly, if the number of reflectors is very large or the reflector surfaces arc not smooth then we must use statis- 
tical approximations to characterize the received signal. We will discuss statistical fading models for propagation 
effects in Chapter 3. Hybrid models, which combine ray tracing and statistical fading, can also be found in the 
literature [13, 14], however we will not describe them here. 

The most general ray tracing model includes all attenuated, diffracted, and scattered multipath components. 
This model uses all of the geometrical and dielectric properties of the objects surrounding the transmitter and re- 
ceiver. Computer programs based on ray tracing such as Lucent’s Wireless Systems Engineering software (WiSE), 
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Wireless Valley’s SitcPIanncrfR) and Marconi’s Planet® EV arc widely used for system planning in both indoor 
and outdoor environments. In these programs computer graphics arc combined with aerial photographs (outdoor 
channels) or architectural drawings (indoor channels) to obtain a 3D geometric picture of the environment [1], 

The following sections describe several ray tracing models of increasing complexity. We start with a simple 
two-ray model that predicts signal variation resulting from a ground reflection interfering with the LOS path. This 
model characterizes signal propagation in isolated areas with few reflectors, such as rural roads or highways. It 
is not typically a good model for indoor environments. We then present a ten-ray reflection model that predicts 
the variation of a signal propagating along a straight street or hallway. Finally, we describe a general model that 
predicts signal propagation for any propagation environment. The two-ray model only requires information about 
the antenna heights, while the ten-ray model requires antenna height and street/hallway width information, and 
the general model requires these parameters as well as detailed information about the geometry and dielectric 
properties of the reflectors, diffractors, and scatterers in the environment. 

2.4.1 Two-Ray Model 

The two-ray model is used when a single ground reflection dominates the multipath effect, as illustrated in Fig- 
ure 2.4. The received signal consists of two components: the LOS component or ray, which is just the transmitted 
signal propagating through free space, and a reflected component or ray, which is the transmitted signal reflected 
off the ground. 



d 




Figure 2.4: Two-Ray Model. 



The received LOS ray is given by the free-space propagation loss formula (2.6). The reflected ray is shown in 
Figure 2.4 by the segments x and x' . If we ignore the effect of surface wave attenuation 2 then, by superposition, 
the received signal for the two-ray model is 

VG lU {t)e-^ l l x Ry/G~ r u{t - r ) e -iMz+V)/A 
l X + x' 



f2ray{t) — 5? j ^ 



p j2irf c t 



( 2 . 11 ) 



where t = (x + x' — l)/c is the time delay of the ground reflection relative to the LOS ray, y/G[ = \J G n Cb 
is the product of the transmit and receive antenna field radiation patterns in the LOS direction, R is the ground 
reflection coefficient, and \J~G~ r = \/G c Gd is the product of the transmit and receive antenna field radiation patterns 
corresponding to the rays of length x and x 1 , respectively. The delay spread of the two-ray model equals the delay 
between the LOS ray and the reflected ray: (x + x' — l)/c. 

If the transmitted signal is narrowband relative to the delay spread (r << B~ l ) then u(t) ~ u(t — r). With 
this approximation, the received power of the two-ray model for narrowband transmission is 



Pr =Pt 



' A ' 
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47T 





VGi RVG~re-i A * 



l 



X + x' 



2 This is a valid approximation for antennas located more than a few wavelengths from the ground. 



( 2 . 12 ) 
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where A< f> = 2n(x + x' — l )/ A is the phase difference between the two received signal components. Equation 
(2.12) has been shown to agree very closely with empirical data [15]. If d denotes the horizontal separation of the 
antennas, ht denotes the transmitter height, and h r denotes the receiver height, then using geometry we can show 
that 

x + x' - l = \J (h t + h r ) 2 + d 2 - y/(h t - h r ) 2 + d 2 . (2.13) 

When d is very large compared to ht + h r we can use a Taylor series approximation in (2.13) to get 



27t(x + x' — () Anht.hr 

A 85 Hid- 



(2.14) 



The ground reflection coefficient is given by [2, 16] 



sin 9 — Z 
sin 9 + Z ’ 



(2.15) 



where 



Z = 



\J f r — cos 2 9 / e: r for vertical polarization 
\J e r — cos 2 9 for horizontal polarization 



(2.16) 



and e r is the dielectric constant of the ground. For earth or road surfaces this dielectric constant is approximately 
that of a pure dielectric (for which e r is real with a value of about 15). 

We see from Figure 2.4 and (2.15) that for asymptotically large d, x + x' « l « d, 9 « 0, G; ~ G r , and 
R ~ —1. Substituting these approximations into (2.12) yields that, in this asymptotic limit, the received signal 
power is approximately 
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d 2 



(2.17) 



or, in dB, we have 



P, dBm = P t dBm+ 101og 10 (G;) + 20log w (h t h r ) - 401og 10 (d). (2.18) 

Thus, in the limit of asymptotically large d, the received power falls off inversely with the fourth power of d and is 
independent of the wavelength A. The received signal becomes independent of A since combining the direct path 
and reflected signal is si mi lar to the effect of an antenna array, and directional antennas have a received power that 
does not necessarily decrease with frequency. A plot of (2.12) as a function of distance is illustrated in Figure 2.5 
for / = 900MHz, R=-l, ht = 50m, h r = 2m, G/ = 1, G r = 1 and transmit power normalized so that the plot 
starts at 0 dBm. This plot can be separated into three segments. For small distances (d < ht) the two rays add 
constructively and the path loss is roughly flat. More precisely, it is proportional to 1 /(d 2 + h/) since, at these small 
distances, the distance between the transmitter and receiver is l = \/ d 2 + (ht — h r ) 2 and thus l/l 2 « 1/ (d 2 + h 2 ) 
for ht » h r , which is typically the case. For distances bigger than ht and up to a certain critical distance d c , 
the wave experiences constructive and destructive interference of the two rays, resulting in a wave pattern with 
a sequence of maxima and minima. These maxima and minima arc also refered to as small-scale or multipath 
fading, discussed in more detail in the next chapter. At the critical distance d c the final maximum is reached, after 
which the signal power falls off proportionally to d/ 4 . This rapid falloff with distance is due to the fact that for 
d> d c the signal components only combine destructively, so they arc out of phase by at least 7 r. An approximation 
for d c can be obtained by setting Aoi> = 7r in (2.14), obtaining d c = Ahth r /X, which is also shown in the figure. 
The power falloff with distance in the two-ray model can be approximated by averaging out its local maxima and 
minima. This results in a piecewise linear model with three segments, which is also shown in Figure 2.5 slightly 
offset from the actual power falloff curve for illustration purposes. In the first segment power falloff is constant 
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and proportional to 1/ ( d 2 + h'f), for distances between ht and d c power falls off at -20 dB/decade, and at distances 
greater than d c power falls off at -40 dB/decade. 

The critical distance d c can be used for system design. For example, if propagation in a cellular system obeys 
the two-ray model then the critical distance would be a natural size for the cell radius, since the path loss associated 
with interference outside the cell would be much larger than path loss for desired signals inside the cell. However, 
setting the cell radius to d c could result in very large cells, as illustrated in Figure 2.5 and in the next example. Since 
smaller cells are more desirable, both to increase capacity and reduce transmit power, cell radii are typically much 
smaller than d c . Thus, with a two-ray propagation model, power falloff within these relatively small cells goes as 
distance squared. Moreover, propagation in cellular systems rarely follows a two-ray model, since cancellation by 
reflected rays rarely occurs in all directions. 



Two-ray model, received signal power, G=1 




Figure 2.5: Received Power versus Distance for Two-Ray Model. 



Example 2.2: Determine the critical distance for the two-ray model in an urban microcell (ht = 10m, h r = 3 m) 
and an indoor microcell (h t = 3 m, h r = 2 m) for f c = 2 GHz. 

Solution: d c = 4/i//i r /A = 800 meters for the urban microcell and 160 meters for the indoor system. A cell 
radius of 800 m in an urban microcell system is a bit large: urban microcells today are on the order of 100 m to 
maintain large capacity. However, if we used a cell size of 800 m under these system parameters, signal power 
would fall off as d? inside the cell, and interference from neighboring cells would fall off as d 4 , and thus would be 
greatly reduced. Similarly, 160 m is quite large for the cell radius of an indoor system, as there would typically be 
many walls the signal would have to go through for an indoor cell radius of that size. So an indoor system would 
typically have a smaller cell radius, on the order of 10-20 m. 
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2.4.2 Ten-Ray Model (Dielectric Canyon) 



We now examine a model for urban microcells developed by Amitay [ 8 ]. This model assumes rectilineal - streets 3 
with buildings along both sides of the street and transmitter and receiver antenna heights that are close to street 
level. The building-lined streets act as a dielectric canyon to the propagating signal. Theoretically, an infinite 
number of rays can be reflected off the building fronts to arrive at the receiver; in addition, rays may also be back- 
reflected from buildings behind the transmitter or receiver. However, since some of the signal energy is dissipated 
with each reflection, signal paths corresponding to more than three reflections can generally be ignored. When 
the street layout is relatively straight, back reflections are usually negligible also. Experimental data show that a 
model of ten reflection rays closely approximates signal propagation through the dielectric canyon [ 8 ]. The ten rays 
incorporate all paths with one, two, or three reflections: specifically, there is the LOS, the ground-reflected ( GR ), 
the single-wall (SW) reflected, the double-wall ( DW ) reflected, the triple-wall (TW) reflected, the wall-ground 
(W G) reflected and the ground- wall (GW) reflected paths. There are two of each type of wall -reflected path, one 
for each side of the street. An overhead view of the ten-ray model is shown in Figure 2.6. 




Figure 2.6: Overhead View of the Ten-Ray Model. 



For the ten-ray model, the received signal is given by 



~u(t - Ti)e-i 2wXi / x 



p j2nf c t 



where x, denotes the path length of the ith reflected ray, r, = (x t — l)/c, and \jG Xi is the product of the transmit 
and receive antenna gains corresponding to the ith ray. For each reflection path, the coefficient It, is either a single 
reflection coefficient given by (2.15) or, if the path corresponds to multiple reflections, the product of the reflection 
coefficients corresponding to each reflection. The dielectric constants used in (2.15) are approximately the same 
as the ground dielectric, so e r = 15 is used for all the calculations of It, . If we again assume a narrowband model 
such that u(t) ~ u(t — t, ) for all i, then the received power corresponding to (2.19) is 




where A <j>i = 27 r(x; — l)/\. 

Power falloff with distance in both the ten-ray model (2.20) and urban empirical data [15, 50, 51] for transmit 
antennas both above and below the building skyline is typically proportional to d 2 , even at relatively large dis- 
tances. Moreover, this falloff exponent is relatively insensitive to the transmitter height. This falloff with distance 
squared is due to the dominance of the multipath rays which decay as d~ 2 , over the combination of the LOS and 
ground-reflected rays (the two-ray model), which decays as d ~ 4 . Other empirical studies [17, 52, 53] have obtained 
power falloff with distance proportional to d -7 , where 7 lies anywhere between two and six. 



3 A rectilinear city is flat, with linear streets that intersect at 90° angles, as in midtown Manhattan. 



33 



2.4.3 General Ray Tracing 



General Ray Tracing (GRT) can be used to predict field strength and delay spread for any building configuration and 
antenna placement [12, 36, 37]. For this model, the building database (height, location, and dielectric properties) 
and the transmitter and receiver locations relative to the buildings must be specified exactly. Since this information 
is site-specific, the GRT model is not used to obtain general theories about system performance and layout; rather, it 
explains the basic mechanism of urban propagation, and can be used to obtain delay and signal strength information 
for a particular transmitter and receiver configuration in a given environment. 

The GRT method uses geometrical optics to trace the propagation of the LOS and reflected signal components, 
as well as signal components from building diffraction and diffuse scattering. There is no limit to the number of 
multipath components at a given receiver location: the strength of each component is derived explicitly based 
on the building locations and dielectric properties. In general, the LOS and reflected paths provide the dominant 
components of the received signal, since diffraction and scattering losses are high. However, in regions close to 
scattering or diffracting surfaces, which may be blocked from the LOS and reflecting rays, these other multipath 
components may dominate. 

The propagation model for the LOS and reflected paths was outlined in the previous section. Diffraction 
occurs when the transmitted signal “bends around” an object in its path to the receiver, as shown in Figure 2.7. 
Diffraction results from many phenomena, including the curved surface of the earth, hilly or irregular terrain, build- 
ing edges, or obstructions blocking the LOS path between the transmitter and receiver [16, 3, 1], Diffraction can 
be accurately characterized using the geometrical theory of diffraction (GTD) [40], however the complexity of this 
approach has precluded its use in wireless channel modeling. Wedge diffraction simplifies the GTD by assuming 
the diffracting object is a wedge rather than a more general shape. This model has been used to characterize the 
mechanism by which signals arc diffracted around street corners, which can result in path loss exceeding 100 dB 
for some incident angles on the wedge [9, 37, 38, 39]. Although wedge diffraction simplifies the GTD, it still 
requires a numerical solution for path loss [40, 41] and thus is not commonly used. Diffraction is most commonly 
modeled by the Fresnel knife edge diffraction model due to its simplicity. The geometry of this model is shown 
in Figure 2.7, where the diffracting object is assumed to be asymptotically thin, which is not generally the case for 
hills, rough terrain, or wedge diffractors. In particular, this model does not consider diffractor parameters such as 
polarization, conductivity, and surface roughness, which can lead to inaccuracies [38]. The geometry of Figure 2.7 
indicates that the diffracted signal travels distance d + d' resulting in a phase shift of <f> = 2n(d + d') / A. The 
geometry of Figure 2.7 indicates that for h small relative to d and d' , the signal must travel an additional distance 
relative to the LOS path of approximately 

. K 2 d + d! 

A ~2 ~ddT’ 

and the corresponding phase shift relative to the LOS path is approximately 



2irAd 

A(j)=—- 




( 2 . 21 ) 



where 



v = h 



2{d + d') 
A dd' 



( 2 . 22 ) 



is called the Fresnel-Kirchoff diffraction parameter. The path loss associated with knife-edge diffraction is 
generally a function of v. However, computing this diffraction path loss is fairly complex, requiring the use of 
Huygen’s principle, Fresnel zones, and the complex Fresnel integral [3]. Moreover, the resulting diffraction loss 
cannot generally be found in closed form. Approximations for knife-edge diffraction path loss (in dB) relative to 
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LOS path loss arc given by Lee [16, Chapter 2] as 



L(y) dB 



' 20 log 10 [0.5 - 0.62n] -0.8 < v < 0 

20 log 10 [0.5e _ ’ 95i; ] 0 < n < 1 

20 log 10 [0.4 - ^.1184- (.38 -.In) 2 ] 1 < v < 2.4 

k 201og 10 [.225/n] v > 2.4 



(2.23) 



A si mi lar approximation can be found in [42] . The knife-edge diffraction model yields the following formula for 
the received diffracted signal: 

r(t) = 5ft |l(u) y/cT d u{t - T )e- j2 ^ d+d ' )/x e j2nfct , } , (2.24) 

where y/Gd is the antenna gain and t = Ad /c is the delay associated with the defracted ray relative to the LOS 
path. 




Figure 2.7: Knife-Edge Diffraction. 

In addition to diffracted rays, there may also be rays that are diffracted multiple times, or rays that are both 
reflected and diffracted. Models exist for including all possible permutations of reflection and diffraction [43]; 
however, the attenuation of the corresponding signal components is generally so large that these components are 
negligible relative to the noise. Diffraction models can also be specialized to a given environment. For example, 
a model for diffraction from rooftops and buildings in cellular systems was developed by Walfisch and Bertoni in 
[57], 




Figure 2.8: Scattering. 



A scattered ray, shown in Figure 2.8 by the segments s' and s, has a path loss proportional to the product of s 
and s'. This multiplicative dependence is due to the additional spreading loss the ray experiences after scattering. 
The received signal due to a scattered ray is given by the bistatic radar equation [44] : 



r(t) = 5ft 




t) 



Ay 

(4yr) 3 /2 ss / 



(2.25) 
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where r = (s + s' — l)/c is the delay associated with the scattered ray, a (in in 2 ) is the radar cross section of the 
scattering object, which depends on the roughness, size, and shape of the scatterer, and y/Gl is the antenna gain. 
The model assumes that the signal propagates from the transmitter to the scatterer based on free space propagation, 
and is then reradiated by the scatterer with transmit power equal to a times the received power at the scatterer. 
From (2.25) the path loss associated with scattering is 

-P r dBm = P t dBm + 10 log 10 (G s ) + 20 log 10 (A) + 10 log 10 (<r ) -301og(47r) - 20 log 10 s - 20 log 10 (s'). (2.26) 

Empirical values of 101og 10 a were determined in [45] for different buildings in several cities. Results from this 
study indicate that 101og 10 cr in dBm 2 ranges from —4.5 dBm 2 to 55.7 dBm 2 , where dBm 2 denotes the dB value 
of the <7 measurement with respect to one square meter. 

The received signal is determined from the superposition of all the components due to the multiple rays. Thus, 
if we have a LOS ray, N r reflected rays, N,i diffracted rays, and N s diffusely scattered rays, the total received signal 
is 
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(2.27) 



where Ti,Tj, and r/, : are, respectively, the time delays of the given reflected, diffracted, or scattered ray normalized 
to the delay of the LOS ray, as defined above. The received power P r of r total (t) and the corresponding path loss 
P r /Pt are then obtained from (2.27). 

Any of these multipath components may have an additional attenuation factor if its propagation path is blocked 
by buildings or other objects. In this case, the attenuation factor of the obstructing object multiplies the compo- 
nent’s path loss term in (2.27). This attenuation loss will vary widely, depending on the material and depth of the 
object [1, 46]. Models for random loss due to attenuation are described in Section 2.7. 



2.4.4 Local Mean Received Power 

The path loss computed from all ray tracing models is associated with a fixed transmitter and receiver location. In 
addition, ray tracing can be used to compute the local mean received power P r in the vicinity of a given receiver 
location by adding the squared magnitude of all the received rays. This has the effect of averaging out local spatial 
variations due to phase changes around the given location. Local mean received power is a good indicator of link 
quality and is often used in cellular systems functions like power control and handoff [47]. 



2.5 Empirical Path Loss Models 

Most mobile communication systems operate in complex propagation environments that cannot be accurately 
modeled by free-space path loss or ray tracing. A number of path loss models have been developed over the years 
to predict path loss in typical wireless environments such as large urban macrocells, urban microcells, and, more 
recently, inside buildings [1, Chapter 3]. These models are mainly based on empirical measurements over a given 
distance in a given frequency range and a particular geographical area or building. However, applications of these 
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models are not always restricted to environments in which the empirical measurements were made, which makes 
the accuracy of such empirically-based models applied to more general environments somewhat questionable. 
Nevertheless, many wireless systems use these models as a basis for performance analysis. In our discussion 
below we will begin with common models for urban macrocells, then describe more recent models for outdoor 
microcells and indoor propagation. 

Analytical models characterize P r /Pt as a function of distance, so path loss is well defined. In contrast, 
empirical measurements of Pr/Pt as a function of distance include the effects of path loss, shadowing, and mul- 
tipath. In order to remove multipath effects, empirical measurements for path loss typically average their received 
power measurements and the corresponding path loss at a given distance over several wavelengths. This average 
path loss is called the local mean attenuation (LMA) at distance d, and generally decreases with d due to free 
space path loss and signal obstructions. The LMA in a given environment, like a city, depends on the specific 
location of the transmitter and receiver corresponding to the LMA measurement. To characterize LMA more gen- 
erally, measurements are typically taken throughout the environment, and possibly in multiple environments with 
similar characteristics. Thus, the empirical path loss PL{d) for a given environment (e.g. a city, suburban area, 
or office building) is defined as the average of the LMA measurements at distance d, averaged over all available 
measurements in the given environment. For example, empirical path loss for a generic downtown area with a 
rectangular street grid might be obtained by averaging LMA measurements in New York City, downtown San 
Francisco, and downtown Chicago. The empirical path loss models given below arc all obtained from average 
LMA measurements. 



2.5.1 The Okumura Model 



One of the most common models for signal prediction in large urban macrocells is the Okumura model [55]. 
This model is applicable over distances of 1-100 Km and frequency ranges of 150-1500 MHz. Okumura used 
extensive measurements of base station-to-mobile signal attenuation throughout Tokyo to develop a set of curves 
giving median attenuation relative to free space of signal propagation in irregular terrain. The base station heights 
for these measurements were 30-100 m, the upper end of which is higher than typical base stations today. The 
empirical path loss formula of Okumura at distance d parameterized by the carrier frequency f c is given by 

P L (d) dB = L(f c , d) + A mu (f c , d) - G(ht) ~ G(h r ) - G A rea (2.28) 



where L(f c , d) is free space path loss at distance d and carrier frequency f c , A mu (f c , d) is the median attenuation 
in addition to free space path loss across all environments, G(ht) is the base station antenna height gain factor, 
G(h r ) is the mobile antenna height gain factor, and Garea is the gain due to the type of environment. The values 
of A mu (f c . d) and G A rea are obtained from Okumura’s empirical plots [55, 1]. Okumura derived empirical 
formulas for G(ht ) and G(h r ) as 



G(h t ) = 201o glo (V200), 



r(h \ _ / 101og 10 (fi r /3) 
[ r) { 201og 10 (fi r /3) 



30m <ht < 1000m 

h r < 3m 
3m < h r < 10m 



(2.29) 

(2.30) 



Correction factors related to terrain are also developed in [55] that improve the model accuracy. Okumura’s model 
has a 10-14 dB empirical standard deviation between the path loss predicted by the model and the path loss 
associated with one of the measurements used to develop the model. 



2.5.2 Hata Model 

The Hata model [54] is an empirical formulation of the graphical path loss data provided by Okumura and is 
valid over roughly the same range of frequencies, 150-1500 MHz. This empirical model simplifies calculation of 
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path loss since it is a closed-form formula and is not based on empirical curves for the different parameters. The 
standard formula for empirical path loss in urban areas under the Hata model is 

PL,urban{d ) dB = 69.55 + 26.16 log 10 (/c) - 13.82 log 10 (/q) — a(h r ) + (44.9 — 6.55 log 10 (/it)) log 10 (d). (2.31) 

The parameters in this model are the same as under the Okumura model, and a(h r ) is a correction factor for the 
mobile antenna height based on the size of the coverage area. For small to medium sized cities, this factor is given 
by [54, 1] 

a{h r ) = (1.1 log 10 (/c) - .7 )hr - (1.56 log 10 (/ c ) - ,8)dB, 
and for larger cities at frequencies f c > 300 MHz by 

a(h r ) = 3.2(log 10 (11.75/i r )) 2 - 4.97 dB. 

Corrections to the urban model are made for suburban and rural propagation, so that these models are, respectively, 

PL,suburban(d ) = PL,urban{d) - 2[log 10 (/ c /28)] 2 - 5.4 (2.32) 

and 

PL,rural(d) = PL,urban{d ) - 4.78[log 10 (/ c )] 2 + 18.33 log 10 (/ c ) - K, (2.33) 

where K ranges from 35.94 (countryside) to 40.94 (desert). The Hata model does not provide for any path specific 
correction factors, as is available in the Okumura model. The Hata model well-approximates the Okumura model 
for distances d > 1 Km. Thus, it is a good model for first generation cellular systems, but does not model 
propagation well in current cellular systems with smaller cell sizes and higher frequencies. Indoor environments 
are also not captured with the Hata model. 

2.5.3 COST 231 Extension to Hata Model 

The Hata model was extended by the European cooperative for scientific and technical research (EURO-COST) to 
2 GHz as follows [56]: 

Pl, urban {d)dB = 46.3+33.9 log 10 (/ c ) - 13.82 log 10 (/i t ) -a(/i r ) + (44. 9-6.55 log 10 (/q)) \og 10 (d)+C M , (2.34) 

where a(h r ) is the same correction factor as before and Cm is 0 dB for medium sized cities and suburbs, and 3 dB 
for metropolitan areas. This model is referred to as the COST 231 extension to the Hata model, and is restricted 
to the following range of parameters: 1.5GHz < f c < 2 GHz, 30m < ht < 200 m, lm < h r < 10 m, and 
lKm < d < 20 Km. 

2.5.4 Piecewise Linear (Multi-Slope) Model 

A common empirical method for modeling path loss in outdoor microcells and indoor channels is a piecewise 
lineal - model of dB loss versus log-distance. This approximation is illustrated in Figure 2.9 for dB attenuation 
versus log-distance, where the dots represent hypothetical empirical measurements and the piecewise linear model 
represents an approximation to these measurements. A piecewise linear model with N segments must specify 
N — 1 breakpoints d \. ... . dw-i and the slopes corresponding to each segment $\. ... , sjy- Different methods can 
be used to determine the number and location of breakpoints to be used in the model. Once these are fixed, the 
slopes corresponding to each segment can be obtained by linear regression. The piecewise linear model has been 
used to model path loss for outdoor channels in [18] and for indoor channels in [48]. 
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Figure 2.9: Piecewise Linear Model for Path Loss. 



A special case of the piecewise model is the dual-slope model. The dual slope model is characterized by 
a constant path loss factor K and a path loss exponent 71 above some reference distance do U P to some critical 
distance d c , after which point power falls off with path loss exponent 72 : 



f Pt + K - IO 71 log 10 (d/d 0 ) d 0 < d < d c 

1 Pt + K - IO 71 log 10 (d c /d 0 ) - IO72 log w {d/d c ) d > d c 



The path loss exponents, K , and d c arc typically obtained via a regression fit to empirical data [34, 32]. The 
two-ray model described in Section 2.4. 1 for d > ht can be approximated with the dual-slope model, with one 
breakpoint at the critical distance d c and attenuation slope si = 20 dB/decade and *2 = 40 dB/decade. 

The multiple equations in the dual-slope model can be captured with the following dual-slope approximation 
[17,49]: 



P r 



PtK 

L(dY 



(2.36) 



where 



m = 






( 71-72 )q 



(2.37) 



In this expression, q is a parameter that determines the smoothness of the path loss at the transition region close to 
the breakpoint distance d c . This model can be extended to more than two regions [18]. 



2.5.5 Indoor Attenuation Factors 

Indoor environments differ widely in the materials used for walls and floors, the layout of rooms, hallways, win- 
dows, and open areas, the location and material in obstructing objects, and the size of each room and the number 
of floors. All of these factors have a significant impact on path loss in an indoor environment. Thus, it is difficult 
to find generic models that can be accurately applied to determine empirical path loss in a specific indoor setting. 

Indoor path loss models must accurately capture the effects of attenuation across floors due to partitions, as 
well as between floors. Measurements across a wide range of building characteristics and signal frequencies in- 
dicate that the attenuation per floor is greatest for the first floor that is passed through and decreases with each 
subsequent floor passed through. Specifically, measurements in [19, 21, 26, 22] indicate that at 900 MHz the atten- 
uation when the transmitter and receiver are separated by a single floor ranges from 10-20 dB, while subsequent 
floor attenuation is 6-10 dB per floor for the next three floors, and then a few dB per floor for more than four floors. 
At higher frequencies the attenuation loss per floor is typically larger [21, 20]. The attenuation per floor is thought 
to decrease as the number of attenuating floors increases due to the scattering up the side of the building and re- 
flections from adjacent buildings. Partition materials and dielectric properties vary widely, and thus so do partition 
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losses. Measurements for the partition loss at different frequencies for different partition types can be found in 
[1, 23, 24, 19, 25], and Table 2.1 indicates a few examples of partition losses measured at 900-1300 MHz from this 
data. The partition loss obtained by different researchers for the same partition type at the same frequency often 
varies widely, making it difficult to make generalizations about partition loss from a specific data set. 



Partition Type 


Partition Loss in dB 


Cloth Partition 


1.4 


Double Plasterboard Wall 


3.4 


Foil Insulation 


3.9 


Concrete wall 


13 


Aluminum Siding 


20.4 


All Metal 


26 



Table 2.1: Typical Partition Losses 



The experimental data for floor and partition loss can be added to an analytical or empirical dB path loss 
model Pfjd) as 

N f N p 

P r dBm = P t dBm - P L (d) - FAF t - PAFi, (2.38) 

1=1 1=1 



FAFi represents the floor attenuation factor (FAF) for the /th floor traversed by the signal, and PAFi represents 
the partition attenuation factor (PAF) associated with the /th partition traversed by the signal. The number of floors 
and partitions traversed by the signal are Nj and N p , respectively. 

Another important factor for indoor systems where the transmitter is located outside the building is the build- 
ing penetration loss. Measurements indicate that building penetration loss is a function of frequency, height, and 
the building materials. Building penetration loss on the ground floor typically range from 8-20 dB for 900 MHz 
to 2 GHz [27, 28, 3]. The penetration loss decreases slightly as frequency increases, and also decreases by about 
1.4 dB per floor at floors above the ground floor. This increase is typically due to reduced clutter at higher floors 
and the higher likelihood of a line-of-sight path. The type and number of windows in a building also have a signif- 
icant impact on penetration loss [29]. Measurements made behind windows have about 6 dB less penetration loss 
than measurements made behind exterior walls. Moreover, plate glass has an attenuation of around 6 dB, whereas 
lead-lined glass has an attenuation between 3 and 30 dB. 



2.6 Simplified Path Loss Model 



The complexity of signal propagation makes it difficult to obtain a single model that characterizes path loss accu- 
rately across a range of different environments. Accurate path loss models can be obtained from complex analytical 
models or empirical measurements when tight system specifications must be met or the best locations for base sta- 
tions or access point layouts must be determined. However, for general tradeoff analysis of various system designs 
it is sometimes best to use a simple model that captures the essence of signal propagation without resorting to 
complicated path loss models, which are only approximations to the real channel anyway. Thus, the following 
simplified model for path loss as a function of distance is commonly used for system design: 



Pr = PtK 




(2.39) 
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The dB attenuation is thus 



P r dBm = P t dBm + K dB — IO7 log 10 



d_ 

do 



(2.40) 



In this approximation, I\ is a unitless constant which depends on the antenna characteristics and the average 
channel attenuation, do is a reference distance for the antenna far-held, and 7 is the path loss exponent. The values 
for I \ , do, and 7 can be obtained to approximate either an analytical or empirical model. In particular, the free- 
space path loss model, two-ray model, Hata model, and the COST extension to the Hata model are all of the same 
form as (2.39). Due to scattering phenomena in the antenna near-held, the model (2.39) is generally only valid at 
transmission distances d > do, where do is typically assumed to be 1-10 m indoors and 10-100 m outdoors. 

When the simplified model is used to approximate empirical measurements, the value of K < 1 is sometimes 
set to the free space path gain at distance do assuming omnidirectional antennas: 



iTdB = 201og loi ^-, (2.41) 

and this assumption is supported by empirical data for free-space path loss at a transmission distance of 100 m [34], 
Alternatively, K can be determined by measurement at do or optimized (alone or together with 7) to minimize the 
mean square error (MSE) between the model and the empirical measurements [34]. The value of 7 depends on 
the propagation environment: for propagation that approximately follows a free-space or two-ray model 7 is set 
to 2 or 4, respectively. The value of 7 for more complex environments can be obtained via a minimum mean 
square error (MMSE) ht to empirical measurements, as illustrated in the example below. Alternatively 7 can be 
obtained from an empirically -based model that takes into account frequency and antenna height [34]. A table 
summarizing 7 values for different indoor and outdoor environments and antenna heights at 900 MHz and 1 .9 GHz 
taken from [30, 45, 34, 27, 26, 19, 22, ?] is given below. Path loss exponents at higher frequencies tend to be higher 
[31, 26, 25, 27] while path loss exponents at higher antenna heights tend to be lower [34]. Note that the wide range 
of empirical path loss exponents for indoor propagation may be due to attenuation caused by floors, objects, and 
partitions, described in Section 2.5.5. 



Environment 


7 range 


Urban macrocells 


3. 7-6.5 


Urban microcells 


2.7-3. 5 


Office Building (same floor) 


1.6-3. 5 


Office Building (multiple floors) 


2-6 


Store 


1. 8-2.2 


Factory 


1.6-3. 3 


Home 


3 



Table 2.2: Typical Path Loss Exponents 



Example 2.3: Consider the set of empirical measurements of P r /Pt given in the table below for an indoor system 
at 900 MHz. Find the path loss exponent 7 that minimizes the MSE between the simplified model (2.40) and the 
empirical dB power measurements, assuming that do = 1 m and K is determined from the free space path gain 
formula at this do- Find the received power at 150 m for the simplified path loss model with this path loss exponent 
and a transmit power of 1 mW (0 dBm). 
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Distance from Transmitter 


M = P r /P t 


10 m 


-70 dB 


20 m 


-75 dB 


50 m 


-90 dB 


100 m 


-110 dB 


300 m 


-125 dB 



Table 2.3: Path Loss Measurements 

Solution: We first set up the MMSE error equation for the dB power measurements as 

5 

F('l) — [^measured (^ ; ) — -^rnodel^*)] > 

i = 1 

where Af measure( j(di) is the path loss measurement in Table 2.3 at distance di and ( ( k ) = /T— 10 7 log| 0 ((i) 

is the path loss based on (2.40) at di. Using the free space path loss formula, K = 20 log 10 (.3333/(47r)) = —31.54 
dB. Thus 

F( 7 ) = (-70 + 31.54 + 10 7 ) 2 + (-75 + 31.54 + 13.01 7 ) 2 + (-90 + 31.54 + 16.99 7 ) 2 
+ (-110 + 31.54 + 20 7 ) 2 + (-125 + 31.54 + 24.77 7 ) 2 

= 21676.3 - 11654. 9 7 + 1571.47 7 2 . (2.42) 

Differentiating F ( 7 ) relative to 7 and setting it to zero yields 

<+F( 7 ) 

— — = -11654.9 + 3142. 94 7 = 0 -> 7 = 3.71. 

<77 

To find the received power at 150 m under the simplified path loss model with K = —31.54, 7 = 3.71, and Pt = 0 
dBm, we have P r = P t + K — 10 7 log 10 (ti/ cZo) = 0 — 31.54 — 10 * 3.71 log 10 (150) = —112.27 dBm. Clearly 
the measurements deviate from the simplified path loss model: this variation can be attributed to shadow fading, 
described in Section 2.7. 



2.7 Shadow Fading 

A signal transmitted through a wireless channel will typically experience random variation due to blockage from 
objects in the signal path, giving rise to random variations of the received power at a given distance. Such variations 
are also caused by changes in reflecting surfaces and scattering objects. Thus, a model for the random attenuation 
due to these effects is also needed. Since the location, size, and dielectric properties of the blocking objects as 
well as the changes in reflecting surfaces and scattering objects that cause the random attenuation are generally 
unknown, statistical models must be used to characterize this attenuation. The most common model for this 
additional attenuation is log-normal shadowing. This model has been confirmed empirically to accurately model 
the variation in received power in both outdoor and indoor radio propagation environments (see e.g. [34, 62].) 
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In the log-normal shadowing model the ratio of transmit-to-receive power ip = Pt/P r is assumed random 
with a log-normal distribution given by 



pO) = 



VZnV^B'lp 



exp 



(IQlogio ip- ^ dB f 

2(J ldB 



ip>0, 



(2.43) 



where p = 10/ In 10, n^ dB is the mean of f/, m = 10 log 10 ip in dB and a^ dB is the standard deviation of ip AB , also 
in dB. The mean can be based on an analytical model or empirical measurements. For empirical measurements 
fiy; dH equals the empirical path loss, since average attenuation from shadowing is already incorporated into the 
measurements. For analytical models, /-h/; dH must incorporate both the path loss (e.g. from free-space or a ray 
tracing model) as well as average attenuation from blockage. Alternatively, path loss can be treated separately 
from shadowing, as described in the next section. Note that if the ip is log-normal, then the received power and 
receiver SNR will also be log-normal since these are just constant multiples of ip. For received SNR the mean and 
standard deviation of this log-normal random variable arc also in dB. For log-normal received power, since the 
random variable has units of power, its mean and standard deviation will be in dBm or dBW instead of dB. The 
mean of ip (the linear average path gain) can be obtained from (2.43) as 



Hil> = E[ip\ = exp 



, a ^dB 

P 2P 2 



(2.44) 



The conversion from the linear mean (in dB) to the log mean (in dB) is derived from (2.44) as 

2 

10 log 10 /q/, = /ty dB + -|p- • (2-45) 

Performance in log-normal shadowing is typically parameterized by the log mean Hy; dH , which is refered to 
as the average dB path loss and is in units of dB. With a change of variables we see that the distribution of the dB 
value of ip is Gaussian with mean and standard deviation a^ dB : 



pO dB) 



v/2'7r<7,0 ds 



exp 



(V’dB PlpdL 

HdB 



(2.46) 



The log-normal distribution is defined by two parameters: n v , dB and cr^ dB . Since ip = P t /P r is always greater 
than one, fJ.y, dH is always greater than or equal to zero. Note that the log-normal distribution (2.43) takes values for 
0 < ip < oo. Thus, for ip < 1, P r > P t , which is physically impossible. However, this probability will be very 
small when li, ddB is large and positive. Thus, the log-normal model captures the underlying physical model most 
accurately when li,j, dH » 0. 

If the mean and standard deviation for the shadowing model are based on empirical measurements then the 
question arises as to whether they should be obtained by taking averages of the linear or dB values of the empirical 
measurements. Specifically, given empirical (linear) path loss measurements { p , , should the mean path loss 
be determined as //,/, = /- YliLi Pi or as I'-vpib = ~k zG?=i 10 log ] o Pi- A similar question arises for computing the 
empirical variance. In practice it is more common to determine mean path loss and variance based on averaging 
the dB values of the empirical measurements for several reasons. First, as we will see below, the mathematical 
justification for the log-normal model is based on dB measurements. In addition, the literature shows that obtaining 
empirical averages based on dB path loss measurements leads to a smaller estimation error [64]. Finally, as we 
saw in Section 2.5.4, power falloff with distance models are often obtained by a piece- wise linear approximation 
to empirical measurements of dB power versus the log of distance [1]. 
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Most empirical studies for outdoor channels support a standard deviation o,f, dH ranging from four to thirteen 
dB [2, 17, 35, 58, 6]. The mean power Hd> dB depends on the path loss and building properties in the area under 
consideration. The mean power varies with distance due to path loss and the fact that average attenuation 
from objects increases with distance due to the potential for a larger number of attenuating objects. 

The Gaussian model for the distribution of the mean received signal in dB can be justified by the following 
attenuation model when shadowing is dominated by the attenuation from blocking objects. The attenuation of a 
signal as it travels through an object of depth d is approximately equal to 

s(d) = e~ ad , (2.47) 

where a is an attenuation constant that depends on the object’s materials and dielectric properties. If we assume 
that a is approximately equal for all blocking objects, and that the ith blocking object has a random depth <7,, then 
the attenuation of a signal as it propagates through this region is 

s(d t ) = e~ a di = e~ adt , (2.48) 

where d/ = JA di is the sum of the random object depths through which the signal travels. If there arc many 
objects between the transmitter and receiver, then by the Cental Limit Theorem we can approximate dt by a 
Gaussian random variable. Thus, log s(dt) = adt will have a Gaussian distribution with mean // and standard 
deviation a. The value of cr will depend on the environment. 



Example 2.4: 

In Example 2.3 we found that the exponent for the simplified path loss model that best fits the measurements 
in Table 2.3 was 7 = 3.71. Assuming the simplified path loss model with this exponent and the same K = —31.54 
dB, find Vp , dB . the variance of log-normal shadowing about the mean path loss based on these empirical measure- 
ments. 

Solution The sample variance relative to the simplified path loss model with 7 = 3.71 is 

2 1 5 

a i’dB = 5 [-^measured (dj) ~ J ^model (di)] , 
i = 1 

where A7 measure d (di) is the path loss measurement in Table 2.3 at distance di and M mo q e ] (d, ) = K— 37.1 log 10 (d). 

Thus 

- [(-70 - 31.54 + 37. 1) 2 + (-75 - 31.54 + 48.27) 2 + (-90 - 31.54 + 63.03) 2 + (-110 - 31.54 + 74.2) 2 
5 

(-125 - 31.54 + 91.90) 2 ] 

13.29. 

Thus, the standard deviation of shadow fading on this path is a^ dB = 3.65 dB. Note that the bracketed term in the 
above expression equals the MMSE formula (2.42) from Example 2.3 with 7 = 3.71. 



2 _ 

a ^dB 

+ 



Extensive measurements have been taken to characterize the empirical correlation of shadowing over distance 
for different environments at different frequencies, e.g. [58, 59, 63, 60, 61]. The most common analytical model 
for this correlation, first proposed by Gudmundson [58] based on empirical measurements, assumes the shadowing 
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'ip(d) is a first-order autoregressive process where the correlation between shadow fading at two points separated 
by distance 5 is characterized by 

A(5) = E [(ipdsid) - + <S) - Mfe)] = °l dB P% D , ( 2 - 49 ) 

where po is the coiTelation between two points separated by a fixed distance D. This correlation must be obtained 
empirically, and varies with the propagation environment and carrier frequency. Measurements indicate that for 
suburban macrocells with f c = 900 MHz, pi) = .82 for I) = 100 m and for urban microcells with f c ~ 2 GHz, 
Pd = -3 for I) = 10 m [58, 60]. This model can be simplified and its empirical dependence removed by setting 
Pd = 1/e for distance D = X c , which yields 

m = 4 d B e ~ 5,Xc - ( 2 - 5 °) 

The decorrelation distance X c in this model is the distance at which the signal autocorrelation equals 1/e of its 
maximum value and is on the order of the size of the blocking objects or clusters of these objects. For outdoor 
systems X c typically ranges from 50 to 100 m [60, 63]. For users moving at velocity v, the shadowing decorrelation 
in time r is obtained by substituting vt = 5 in (2.49) or (2.50). Autocorrelation relative to angular spread, which 
is useful for the multiple antenna systems treated in Chapter 10, has been investigated in [60, 59]. 

The first-order autoregressive coiTelation model (2.49) and its simplified form (2.50) are easy to analyze and 
to simulate. Specifically, one can simulate i/j^b by first generating a white Gaussian noise process with power 
and then passing it through a first order filter with response p ^ D for a correlation characterized by (2.49) or 
e -S/x c f or a correlation characterized by (2.50). The filter output will produce a shadowing random process with 
the desired correlation properties [58,6]. 

2.8 Combined Path Loss and Shadowing 

Models for path loss and shadowing can be superimposed to capture power falloff versus distance along with the 
random attenuation about this path loss from shadowing. In this combined model, average dB path loss {pp dB 
) is characterized by the path loss model and shadow fading, with a mean of 0 dB, creates variations about this 
path loss, as illustrated by the path loss and shadowing curve in Figure 2.1. Specifically, this curve plots the 
combination of the simplified path loss model (2.39) and the log-normal shadowing random process defined by 
(2.46) and (2.50). For this combined model the ratio of received to transmitted power in dB is given by: 

77 ( dB ) = 10 log 10 K - IO 7 log 10 - i/) dB , (2.51) 

-G a 0 

where 7,//j is a Gauss-distributed random variable with mean zero and variance &^ dB - In (2.51) and as shown in 
Figure 2.1, the path loss decreases linearly relative to log 10 d with a slope of IO 7 dB/decade, where 7 is the path 
loss exponent. The variations due to shadowing change more rapidly, on the order of the decorrelation distance 
X c . 

The prior examples 2.3 and 2.4 illustrate the combined model for path loss and log-normal shadowing based 
on the measurements in Table 2.3, where path loss obeys the simplified path loss model with K = —31.54 dB and 
path loss exponent 7 = 3.71 and shadowing obeys the log normal model with mean given by the path loss model 
and standard deviation crp dB = 3.65 dB. 

2.9 Outage Probability under Path Loss and Shadowing 

The combined effects of path loss and shadowing have important implications for wireless system design. In 
wireless systems there is typically a target minimum received power level P min below which performance becomes 
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unacceptable (e.g. the voice quality in a cellular system is too poor to understand). However, with shadowing 
the received power at any given distance from the transmitter is log-normally distributed with some probability 
of falling below P m in- We define outage probability p 0 ut{Pmin,d) under path loss and shadowing to be the 
probability that the received power at a given distance d, P r {d), falls below P rmn : p 0 ut{Pmin, d) = p(P r (d) < 
Pmin)- For the combined path loss and shadowing model of Section 2.8 this becomes 



P{Pr(d) < Pmin ) = 1 ~Q 




(Pt + IQlogio K - IO 7 log 10 ((i/ dp)) 

a ipdB 



(2.52) 



where the Q function is defined as the probability that a Gaussian random variable x with mean zero and variance 
one is bigger than z: 



. /*oo 1 

Q(z) =p{x> z)= / -=e~ y2/2 dy. 

J z V 

The conversion between the Q function and complementary error function is 



(2.53) 



Q(z) = 2 erfc 




(2.54) 



We will omit the parameters of p ou t when the context is clear or in generic references to outage probability. 



Example 2.5: 

Find the outage probability at 150 m for a channel based on the combined path loss and shadowing mod- 
els of Examples 2.3 and 2.4, assuming a transmit power of P t = 10 mW and minimum power requirement 
Pmin = -110.5 dBm. 



Solution We have Pi = 10 mW = 10 dBm. 



P out (-110.5dBm,150m) = p(P r (150m) < -110.5dBm) 

' Pmin - {Pt + IQlogio K ~ IO7 \og w (d/do)) ^ 



= 1 -Q 



= 1 -Q 

= .0121. 



(J 4’dB 

-110.5 - (10 - 31.54 - 37.1 log 10 [150]) \ 
3435 ) 



An outage probabilities of 1% is a typical target in wireless system designs. 



2.10 Cell Coverage Area 

The cell coverage area in a cellular system is defined as the expected percentage of area within a cell that has 
received power above a given minimum. Consider a base station inside a circular cell of a given radius II. All mo- 
biles within the cell require some minimum received SNR for acceptable performance. Assuming some reasonable 
noise and interference model, the SNR requirement translates to a minimum received power P mm throughout the 
cell. The transmit power at the base station is designed for an average received power at the cell boundary of P r, 
averaged over the shadowing variations. However, shadowing will cause some locations within the cell to have 
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Path loss and 
random shadowing 




Figure 2.10: Contours of Constant Received Power. 



received power below Pr, and others will have received power exceeding P r. This is illustrated in Figure 2.10, 
where we show contours of constant received power based on a fixed transmit power at the base station for path loss 
and average shadowing and for path loss and random shadowing. For path loss and average shadowing constant 
power contours form a circle around the base station, since combined path loss and average shadowing is the same 
at a uniform distance from the base station. For path loss and random shadowing the contours form an amoeba-like 
shape due to the random shadowing variations about the average. The constant power contours for combined path 
loss and random shadowing indicate the challenge shadowing poses in cellular system design. Specifically, it is 
not possible for all users at the cell boundary to receive the same power level. Thus, the base station must either 
transmit extra power to insure users affected by shadowing receive their minimum required power P m in, which 
causes excessive interference to neighboring cells, or some users within the cell will not meet their minimum re- 
ceived power requirement. In fact, since the Gaussian distribution has infinite tails, there is a nonzero probability 
that any mobile within the cell will have a received power that falls below the minimum target, even if the mobile 
is close to the base station. This makes sense intuitively since a mobile may be in a tunnel or blocked by a large 
building, regardless of its proximity to the base station. 

We now compute cell coverage area under path loss and shadowing. The percentage of area within a cell 



47 




where the received power exceeds the mi nimum required power P m in is obtained by taking an incremental area 
dA at radius r from the base station (BS) in the cell, as shown in Figure 2.10. Let P r (r) be the received power 
in dA from combined path loss and shadowing. Then the total area within the cell where the minimum power 
requirement is exceeded is obtained by integrating over all incremental areas where this minimum is exceeded: 



C = E 



7 tR 2 



'cell area 



1 [Pr(r) > Pmin incL4]cL4 



'cell area 



E [l[P r (r) > Pmi n in dA]] dA, (2.55) 



where 1 [-] denotes the indicator function. Define P \ = p(P r (r ) > Pmin ) in dA. Then Pa = E [l[P r (r) > P mm in dA]] . 
Making this substitution in (2.55) and using polar coordinates for the integration yields 

C = — f P A dA = — [ f P A rdrdd. (2.56) 

77 P 2 J cell area 77 Jo Jo 

The outage probability of the cell is defined as the percentage of area within the cell that does not meet its 
minimum power requirement P mm , i.e. = 1 — C. 

Given the log-normal distribution for the shadowing. 



p(P r (r) > P min ) = Q 



Pmin - (Pt + lOlogio K - IO7 log 10 (r/(io)) 

a ^dB 



— 1 Pouti.Pmini'P) i 



(2.57) 



where p ou t is the outage probability defined in (2.52) with d = r. Locations within the cell with received power 
below P m ,i n arc said to be outage locations. 

Combining (2.56) and (2.57) we get 4 

c = ^i rQ (“ + Mn 1) *’ 

where _ 

^ Pmin Pr(R) b = IO7 logio(e) 

a ^dB a ipdB 

and Pp = P t + 101og 10 K — IO7 log l0 (/i , /d(,) is the received power at the cell boundary (distance R from the 
base station) due to path loss alone. This integral yields a closed-form solution for C in terms of a and b: 



(2.58) 

(2.59) 



C = Q{a) + exp 



2 — 2ab 



Q 



2 — ab 



(2.60) 



If the target minimum received power equals the average power at the cell boundary: P m ,i n = P r (R ), then a = 0 
and the coverage area simplifies to 

c ’ = P exp (p)°(i)' <261) 

Note that with this simplification C depends only on the ratio 7 / <J t p dB . Moreover, due to the symmetry of the 
Gaussian distribution, under this assumption the outage probability at the cell boundary p out (P r {R), R) = 0.5. 



Example 2.6: 

4 Recall that (2.57) is generally only valid for r > do, yet to simplify the analysis we have applied the model for all r. This approximation 
will have little impact on coverage area, since do is typically very small compared to R and the outage probability for r < do is negligible. 
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Find the coverage area for a cell with the combined path loss and shadowing models of Examples 2.3 and 2.4, 
a cell radius of 600 m, a base station transmit power of Pt = 100 mW = 20 dBm, and a minimum received power 
requirement of P m i n = —110 dBm and of P m in = —120 dBm. 



Solution We first consider P m i n = —110 and check if a = 0 to determine whether to use the full formula (2.60) or 
the simplified formula (2.61). We have P r (R ) = Pt + K — 107 log 10 ( 600 ) = 20 — 31.54 — 37.1 log 10 [600] = 
— 114.6dBm / —110 dBm, so we use (2.60). Evaluating a and b from (2.59) yields a = (—110 + 114.6)/3.65 = 
1.26 and b = 37.1 * .434/3.65 = 4.41. Substituting these into (2.60) yields 



C = Q( 1.26) + exp 



2(1.26 * 4.41) \ / 2 — (1.26)(4.41) 

4.41 2 ) Q V 441 



= .59, 



which would be a very low coverage value for an operational cellular system (lots of unhappy customers). Now 
considering the less stringent received power requirement P mm = —120 dBm yields a = (—120 + 114.9) /3.65 = 
— 1.479 and the same b = 4.41. Substituting these values into (2.60) yields C = .988, a much more acceptable 
value for coverage area. 



Example 2.7: Consider a cellular system designed so that P mvn = P r (R ), i.c. the received power due to path 
loss and average shadowing at the cell boundary equals the minimum received power required for acceptable per- 
formance. Find the coverage area for path loss values 7 = 2, 4, 6 and cr^ dB = 4, 8, 12 and explain how coverage 
changes as 7 and cr^ dB increase. 

Solution: For Pmin = P r (R ) we have a = 0 so coverage is given by the formula (2.61). The coverage area 
thus depends only on the value for b = IO7 log 10 [e] /cr.0 ds , which in turn depends only on the ratio "f/o~,/, dH . The 
following table contains coverage area evaluated from (2.61) for the different 7 and <r^ dB values. 



7 \ a ipdB 


4 


8 


12 


2 


.77 


.67 


.63 


4 


.85 


.77 


.71 


6 


.90 


.83 


.77 



Table 2.4: Coverage Area for Different 7 and 04 , dH 



Not surprisingly, for fixed 7 the coverage area increases as <r^ decreases: that is because a smaller o^ dB means 
less variation about the mean path loss, and since with no shadowing we have 100% coverage (since P m i n = 
P r (R)), we expect that as a WdH decreases to zero, coverage area increases to 100%. It is a bit more puzzling that 
for a fixed cr^ dB coverage area increases as 7 increases, since a larger 7 implies that received signal power falls off 
more quickly. But recall that we have set P m in = P r (R ), so the faster power falloff is already taken into account 
(i.e. we need to transmit at much higher power with 7 = 6 than with 7 = 2 for this equality to hold). The reason 
coverage area increases with path loss exponent under this assumption is that, as 7 increases, the transmit power 
must increase to satisfy P m .i n = P r (R). This results in higher average power throughout the cell, resulting in a 
higher coverage area. 
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Chapter 2 Problems 



1 . Under a free space path loss model, find the transmit power required to obtain a received power of 1 dBm for 
a wireless system with isotropic antennas (G/ = 1) and a carrier frequency / = 5 GHz, assuming a distance 
d = 10m. Repeat for d = 100m. 

2. For a two-path propagation model with transmitter-receiver separation d = 100 m, h / = 10 m, and h r = 2 
m, find the delay spread between the two signals. 

3. For the two ray model, show how a Taylor series approximation applied to (2. 13) results in the approximation 

27r(r / + r — l) Airhth r 

A a ^d- 

4. For the two-ray path loss model, derive an approximate expression for the distance values below the critical 
distance d c at which signal nulls occur. 

5. Find the critical distance d c = under the two-path model for a large macrocell in a suburban area with the 
base station mounted on a tower or building (h t = 20m), the receivers at height h r = 3m, and f c = 2 GHz. 
Is this a good size for cell radius in a suburban macrocell? Why or why not? 

6. Suppose that instead of a ground reflection, a two-path model consists of a LOS component and a signal 
reflected off a building to the left (or right) of the LOS path. Where must the building be located relative to 
the transmitter and receiver for this model to be the same as the two-path model with a LOS component and 
ground reflection? 

7. Consider a two-path channel with impulse response h(t) = ai<5(r) + a^dir — .022/vscc). Find the distance 
separating the transmitter and receiver, as well as oj and a- 2 , assuming free space path loss on each path with 
a reflection coefficient of - 1 . Assume the transmitter and receiver arc located 8 meters above the ground and 
the carrier frequency is 900 MHz. 

8. Directional antennas arc a powerful tool to reduce the effects of multipath as well as interference. In par- 
ticular, directional antennas along the LOS path for the two-ray model can reduce the attenuation effect of 
the ground wave cancellation, as will be illustrated in this problem. Plot the dB power (101og 10 P r ) versus 
log distance (log 10 d) for the two-ray model with the parameters / = 900MHz, R=-l, ht = 50m, h r = 2m, 
Gi = 1, and the following values for G r \ G r = 1, .316, .1, and .01 (i.e. G r = 0, —5, —10, and —20 dB, 
respectively). Each of the 4 plots should range in distance from d = lm to d = 100, 000m. Also calculate 
and mark the critical distance d c = 4hth r /\ on each plot, and normalize the plots to start at approximately 
OdB. Finally, show the piecewise linear model with flat power falloff up to distance ht, falloff 10 log 10 (cU 2 ) 
for h t < d < d c , and falloff 101og 10 (d~ 4 ) for d > d c . (on the power loss versus log distance plot the 
piecewise linear curve becomes a set of three straight lines with slope 0, 2, and 4, respectively). Note that 
at large distances it becomes increasingly difficult to have G r « Gi since it requires extremely precise 
angular directivity in the antennas. 

9. What average power falloff with distance would you expect for the 10-ray model and why? 

10. For the 10-ray model, assume the transmitter and receiver are in the middle of a street of width 20 m and arc 
at the same height. The transmitter-receiver separation is 500 m. Find the delay spread for this model. 
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(0,0) d (0,d) 



Figure 2.11: System with Scattering 



11. Consider a system with a transmitter, receiver, and scatterer as shown in Figure 2.1 1. Assume the transmitter 
and receiver are both at heights ht = h r = 4m and are separated by distance d, with the scatterer at distance 
.5d along both dimensions in a two-dimensional grid of the ground, i.e. on such a grid the transmitter is 
located at (0, 0), the receiver is located at (0, d) and the scatterer is located at (.5 d, .5 d). Assume a radar 
cross section of 20 dBm 2 . Find the path loss of the scattered signal for d = 1, 10, 100, and 1000 meters. 
Compare with the path loss at these distances if the signal is just reflected with reflection coefficient R = — 1. 

12. Under what conditions is the simplified path loss model (2.39) the same as the free space path loss model 
(2.7). 

13. Consider a receiver with noise power -160 dBm within the signal bandwidth of interest. Assume a simplified 
path loss model with do = 1 m - K obtained from the free space path loss formula with omnidirectional 
antennas and f c = 1 GHz, and 7 = 4. For a transmit power of P t = 10 mW, find the maximum distance 
between the transmitter and receiver such that the received signal-to-noise power ratio is 20 dB. 

14. This problem shows how different propagation models can lead to very different SNRs (and therefore differ- 
ent link performance) for a given system design. Consider a linear cellular system using frequency division, 
as might operate along a highway or rural road, as shown in Figure 2. 12 below. Each cell is allocated a 
certain band of frequencies, and these frequencies arc reused in cells spaced a distance d away. Assume the 
system has square cells which are two kilometers per side, and that all mobiles transmit at the same power P. 
For the following propagation models, determine the minimum distance that the cells operating in the same 
frequency band must be spaced so that uplink SNR (the ratio of the minimum received signal-to-interference 
power (S/I) from mobiles to the base station) is greater than 20 dB. You can ignore all interferes except 
from the two nearest cells operating at the same frequency. 

(a) Propagation for both signal and interference follow a free-space model. 

(b) Propagation for both signal and interference follow the simplified path loss model (2.39) with d o = 
100m, K = 1, and 7 = 3. 

(c) Propagation for the signal follows the simplified path loss model with d 0 = 100m, K = 1, and 7 = 2, 
while propagation of the interfererence follows the same model but with 7 = 4. 

15. Find the median path loss under the Hata model assuming f c = 900 MHz, ht = 20m, h r = 5 m and 
d = 100m for a large urban city, a small urban city, a suburb, and a rural area. Explain qualitatively the path 
loss differences for these 4 environments. 

16. (Computer plots) Find parameters for a piecewise linear model with three segments to approximate the two- 
path model path loss (2.4.1) over distances between 10 and 1000 meters, assuming h t = 10m and h r = 2 m. 
Plot the path loss and the piecewise linear approximation using these parameters over this distance range. 
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2 Kms 




■ Base Station/Cell Center 

Figure 2.12: Lineal Cells 



17. Using the indoor attentuation model, determine the required transmit power for a desired received power of 
-110 dBm for a signal transmitted over 100 m that goes through 3 floors with attenuation 15 dB, 10 dB, and 
6 dB, respectively, as well as 2 double plasterboard walls. Assume a reference distance d o = 1 and constant 
K = 0 dB. 

18. The following table lists a set of empirical path loss measurements. 



Distance from Transmitter 


Pr/Pt 


5 m 


-60 dB 


25 m 


-80 dB 


65 m 


-105 dB 


110 m 


-115 dB 


400 m 


-135 dB 


1000 m 


-150 dB 



(a) Find the parameters of a simplified path loss model plus log normal shadowing that best fit this data. 

(b) Find the path loss at 2 Km based on this model. 

(c) Find the outage probability at a distance d assuming the received power at d due to path loss alone is 
10 dB above the required power for nonoutage. 

19. Consider a cellular system operating at 900 MHz where propagation follows free space path loss with varia- 
tions from log normal shadowing with a = 6 dB. Suppose that for acceptable voice quality a signal-to-noise 
power ratio of 15 dB is required at the mobile. Assume the base station transmits at 1 W and its antenna has 
a 3 dB gain. There is no antenna gain at the mobile and the receiver noise in the bandwidth of interest is -10 
dBm. Find the maximum cell size so that a mobile on the cell boundary will have acceptable voice quality 
90% of the time. 

20. In this problem we will simulate the log normal fading process over distance based on the autocorrelation 
model (2.50). As described in the text, the simulation first generates a white noise process and then passes 
it through a first order filter with a pole at e~ 5 ^ Xc . Assume X c = 20 m and plot the resulting log normal 
fading process over a distance d ranging from 0 to 200 m, sampling the process every meter. You should 
normalize your plot about 0 dB, since the mean of the log normal shadowing is captured by path loss. 

21. In this problem we will explore the impact of different log-normal shadowing parameters on outage proba- 
bility. Consider a a cellular system where the received signal power is distributed according to a log-normal 
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distribution with mean // dBm and standard deviation a v , dBm. Assume the received signal power must be 
above 10 dBm for acceptable performance. 

(a) What is the outage probability when the log-normal distribution has /r, /; = 15 dBm and oy, = 8 dBm? 

(b) For 0 ^ = 4dBm, what value of (i,/, is required such that the outage probability is less than 1%, a typical 
value for high-quality PCS systems? 

(c) Repeat (b) for a t p = 12dBm. 

(d) One proposed technique to reduce outage probability is to use macrodiversity, where a mobile unit’s 
signal is received by multiple base stations and then combined. This can only be done if multiple base 
stations are able to receive a given mobile’s signal, which is typically the case for CDMA systems. 
Explain why this might reduce outage probability. 

22. Derive the formula for coverage area (2.61) by applying integration by parts to (2.59). 

23. Find the coverage area for a microcellular system where path loss follows the simplied model with 7 = 3, 
do = 1, and K = 0 dB and there is also log normal shadowing with a = 4 dB. Assume a cell radius of 100 
m, a transmit power of 80 mW, and a minimum received power requirement of P m i n = —100 dBm. 

24. Consider a cellular system where path loss follows the simplied model with 7 = 6 , and there is also log 
normal shadowing with a = 8 dB. If the received power at the cell boundary due to path loss is 20 dB higher 
than the minimum required received power for nonoutage, find the cell coverage area. 

25. In microcells path loss exponents usually range from 2 to 6 , and shadowing standard deviation typically 
ranges from 4 to 12. Assuming a cellular system where the received power due to path loss at the cell 
boundary equals the desired level for nonoutage, find the path loss and shadowing parameters within these 
ranges that yield the best coverage area and the worst coverage. What is the coverage area when these 
parameters are in the middle of their typical ranges. 
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Chapter 3 

Statistical Multipath Channel Models 



In this chapter we examine fading models for the constructive and destructive addition of different multipath 
components introduced by the channel. While these multipath effects arc captured in the ray-tracing models from 
Chapter 2 for deterministic channels, in practice deterministic channel models are rarely available, and thus we 
must characterize multipath channels statistically. In this chapter we model the multipath channel by a random 
time- varying impulse response. We will develop a statistical characterization of this channel model and describe 
its important properties. 

If a single pulse is transmitted over a multipath channel the received signal will appeal - as a pulse train, with 
each pulse in the train corresponding to the LOS component or a distinct multipath component associated with 
a distinct scatterer or cluster of scatterers. An important characteristic of a multipath channel is the time delay 
spread it causes to the received signal. This delay spread equals the time delay between the arrival of the first 
received signal component (LOS or multipath) and the last received signal component associated with a single 
transmitted pulse. If the delay spread is small compared to the inverse of the signal bandwidth, then there is little 
time spreading in the received signal. However, when the delay spread is relatively large, there is significant time 
spreading of the received signal which can lead to substantial signal distortion. 

Another characteristic of the multipath channel is its time-varying nature. This time variation arises because 
either the transmitter or the receiver is moving, and therefore the location of reflectors in the transmission path, 
which give rise to multipath, will change over time. Thus, if we repeatedly transmit pulses from a moving trans- 
mitter, we will observe changes in the amplitudes, delays, and the number of multipath components corresponding 
to each pulse. However, these changes occur over a much larger time scale than the fading due to constructive and 
destructive addition of multipath components associated with a fixed set of scatterers. We will first use a generic 
time-varying channel impulse response to capture both fast and slow channel variations. We will then restrict this 
model to narrowband fading, where the channel bandwidth is small compared to the inverse delay spread. For 
this narrowband model we will assume a quasi-static environment with a fixed number of multipath components 
each with fixed path loss and shadowing. For this quasi-static environment we then characterize the variations over 
short distances (small-scale variations) due to the constructive and destructive addition of multipath components. 
We also characterize the statistics of wideband multipath channels using two-dimensional transforms based on the 
underlying time-varying impulse response. Discrete-time and space-time channel models are also discussed. 

3.1 Time-Varying Channel Impulse Response 

Let the transmitted signal be as in Chapter 2: 

s(t ) = 5? jrt(f)e j27r ^ ct j = 3? {w(f)} cos(27r/ c f) — (2 {tt(f)} sin(27r/ c f), (3.1) 
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where u(t) is the complex envelope of s(t) with bandwidth B u and f c is its carrier frequency. The corresponding 
received signal is the sum of the line-of-sight (LOS) path and all resolvable multipath components: 



r(i) 



N(t) 






n = 0 



(3.2) 



where n = 0 coiTesponds to the LOS path. The unknowns in this expression are the number of resolvable multipath 
components N(t), discussed in more detail below, and for the LOS path and each multipath component, its path 
length r n (t) and corresponding delay r, t ( t ) = r n (t)/c , Doppler phase shift <t>D n {t) and amplitude a: n (t). 

The nth resolvable multipath component may correspond to the multipath associated with a single reflector 
or with multiple reflectors clustered together that generate multipath components with si mi lar delays, as shown 
in Figure 3.1. If each multipath component corresponds to just a single reflector then its corresponding ampli- 
tude a n (t) is based on the path loss and shadowing associated with that multipath component, its phase change 
associated with delay r n (f) is e _j27r ^ cTri W , and its Doppler shift fD n (t ) = ncos 9 n (t) /lambda for 9 n {t ) its angle 
of arrival. This Doppler frequency shift leads to a Doppler phase shift of 6 o„ = j t 2 tt fr> n {t)dt. Suppose, how- 
ever, that the nth multipath component results from a reflector cluster 1 . We say that two multipath components 
with delay n and T 2 arc resolvable if their delay difference significantly exceeds the inverse signal bandwidth: 
|ti — 72 1 >> B~ l . Multipath components that do not satisfy this resolvability criteria cannot be separated out 
at the receiver, since u(t — n) ~ u(t — 72 ), and thus these components arc nonresol vable. These nonresolvable 
components are combined into a single multipath component with delay r « t\ « T 2 and an amplitude and phase 
corresponding to the sum of the different components. The amplitude of this summed signal will typically undergo 
fast variations due to the constructive and destructive combining of the nonresolvable multipath components. In 
general wideband channels have resolvable multipath components so that each term in the summation of (3.2) 
corresponds to a single reflection or multiple nonresolvable components combined together, whereas narrowband 
channels tend to have nonresolvable multipath components contributing to each term in (3.2). 



Reflector 

Cluster 




Figure 3.1: A Single Reflector and A Reflector Cluster. 

Since the parameters a n (t), T n (t), and 1 pD n (t ) associated with each resolvable multipath component change 
over time, they are characterized as random processes which we assume to be both stationary and ergodic. Thus, 
the received signal is also a stationary and ergodic random process. For wideband channels, where each term in 

'Equivalently, a single “rough” reflector can create different multipath components with slightly different delays. 
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(3.2) corresponds to a single reflector, these parameters change slowly as the propagation environment changes. 
For narrowband channels, where each term in (3.2) results from the sum of nonresolvable multipath components, 
the parameters can change quickly, on the order of a signal wavelength, due to constructive and destructive addition 
of the different components. 

We can simplify r(t) by letting 

<t>n{t) = 2ltf c T n (t) - <t> Dn . (3.3) 

Then the received signal can be rewritten as 



r(t) = 3? 





'W) 

Y, a n (t)e~ j ^ n ^u(t - r n (t )) 


j2nf c t 1 


l 


n=0 


J 



(3.4) 



Since a n (t ) is a function of path loss and shadowing while <p n (t) depends on delay and Doppler, we typically 
assume that these two random processes arc independent. 

The received signal r(t) is obtained by convolving the baseband input signal u(t ) with the equivalent lowpass 
time- varying channel impulse response c(r, t) of the channel and then upconverting to the carrier frequency 2 : 

r(t) = c(r, t)u(t — r)dr^ e j27V ^ ct 1 . (3.5) 

Note that c(r, t) has two time parameters: the time t when the impulse response is observed at the receiver, and 
the time t — r when the impulse is launched into the channel relative to the observation time t. If at time t there 
is no physical reflector in the channel with multipath delay T n (t) = r then c(r. t) = 0. While the definition of 
the time-varying channel impulse response might seem counterintuitive at first, c(r, t) must be defined in this way 
to be consistent with the special case of time-invariant channels. Specifically, for time-invariant channels we have 
c(r, t) = c(t, t + T ), i.e. the response at time t to an impulse at time t — r equals the response at time t. + T to an 
impulse at time t + T — r. Setting T = —t, we get that c(r, t) = c(r, t — t) = c(r), where c(r) is the standard 
time-invariant channel impulse response: the response at time r to an impulse at zero or, equivalently, the response 
at time zero to an impulse at time — r. 

We see from (3.4) and (3.5) that c(r, t) must be given by 

N(t) 

c(r,f) = ^2 a n (t)e~ j ^5(T - T n (t)), (3.6) 

n = 0 

where c(r, t) represents the equivalent lowpass response of the channel at time t to an impulse at time t — r. 
Substituting (3.6) back into (3.5) yields (3.4), thereby confirming that (3.6) is the channel’s equivalent lowpass 

2 See Appendix A for discussion of the lowpass equivalent representation for bandpass signals and systems. 
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time- varying impulse response: 



r{t) = ft 



= ft 



= ft 



= ft 



/ OO 

c(r, i)«(i — r)dr e j27r ^ c<: 

-oo 
roo 

/ ^ a n (t)e~ j ^ n ^S(T - T n (t))u(t - t ) dr 

J — OO „ n 



'JV(t) 

y: a n (t)e~^ n ^ ( I 5 (t - Tn(t))u(t - t)cLt 

n=0 
N(t) 

Y a n {t)e- j<t,n ^u{t - T n (t )) I ^' 2,r/ct 
rc =0 



J2nf c t 



JZnfct 



where the last equality follows from the sifting property of delta functions: f 8 (t — T n (t))u(t — r)dr = 8(t — 
Tn(t))*u(t) = u(t — T n {t)). Some channel models assume a continuum of multipath delays, in which case the sum 
in (3.6) becomes an integral which simplifies to a time-varying complex amplitude associated with each multipath 
delay r: 

C(T, t) = / <*({. s (T _ m = a{T , t)e -Mr». (3.7) 

To give a concrete example of a time-varying impulse response, consider the system shown in Figure 3.2, where 
each multipath component corresponds to a single reflector. At time 1 1 there arc three multipath components 
associated with the received signal with amplitude, phase, and delay triple (aq, <f>i , r*), * = 1,2, 3. Thus, impulses 
that were launched into the channel at time t\ —T u i = 1,2,3 will all be received at time t ] , and impulses launched 
into the channel at any other time will not be received at t \ (because there is no multipath component with the 
corresponding delay). The time-varying impulse response corresponding to 1 1 equals 

2 

c(t, ti) = Y a n e~ j<l>n 5{T - r n ) (3.8) 

n=0 

and the channel impulse response for t = 1 1 is shown in Figure 3.3. Figure 3.2 also shows the system at time t- 2 , 
where there arc two multipath components associated with the received signal with amplitude, phase, and delay 
triple (a', <i' % . t, ■), i = 1,2. Thus, impulses that were launched into the channel at time 1 2 — t', i = 1,2 will all 
be received at time 1 2 , and impulses launched into the channel at any other time will not be received at 1 2 - The 
time- varying impulse response at 1 2 equals 

1 

c(t, t 2 ) = Y a l e_30 ^( r - T n) (3-9) 

71=0 



and is also shown in Figure 3.3. 

If the channel is time-invariant then the time-varying parameters in c(r, t) become constant, and c(r, t ) = c(r) 
is just a function of r: 

N 

c ( r ) = Y ane ~ j<l>riS ( T ~ Tn )’ (3.10) 

n= 0 

for channels with discrete multipath components, and c(r) = a(r)e~^^ T > for channels with a continuum of 
multipath components. For stationary channels the response to an impulse at time 1 1 is just a shifted version of its 
response to an impulse at time t 2 ,ti / t 2 . 
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Figure 3.2: System Multipath at Two Different Measurement Times. 
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Figure 3.3: Response of Nonstationary Channel. 



Example 3.1: Consider a wireless LAN operating in a factory near a conveyor belt. The transmitter and receiver 

have a LOS path between them with gain qq, phase Lo and delay to- Every To seconds a metal item comes down 

the conveyor belt, creating an additional reflected signal path in addition to the LOS path with gain a i , phase ([)\ 
and delay n. Find the time- varying impulse response c(r, t) of this channel. 

Solution: For t / nTo, n = 1,2,... the channel impulse response corresponds to just the LOS path. For t = nTo 
the channel impulse response has both the LOS and reflected paths. Thus, c(r, t) is given by 

/ .n = f a 0 e^°(5(r - r 0 ) t / nT 0 

C ^ T ’ \ aoe^°5{r — To) + oiie^^dlyT — T\) t. = nTo 



Note that for typical carrier frequencies, the nth multipath component will have f c T n (t) >> 1. For example, 
with f c = 1 GHz and r n = 50 ns (a typical value for an indoor system), f c T n = 50 >> 1. Outdoor wireless 
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systems have multipath delays much greater than 50 ns, so this property also holds for these systems. If f c T n (t) » 
1 then a small change in the path delay r n (f) can lead to a very large phase change in the nth multipath component 
with phase 4> n (t) = 2n f c T n (t) — 4>D n — </>o- Rapid phase changes in each multipath component gives rise to 
constructive and destructive addition of the multipath components comprising the received signal, which in turn 
causes rapid variation in the received signal strength. This phenomenon, called fading, will be discussed in more 
detail in subsequent sections. 

The impact of multipath on the received signal depends on whether the spread of time delays associated with 
the LOS and different multipath components is large or small relative to the inverse signal bandwidth. If this 
channel delay spread is small then the LOS and all multipath components are typically nonre solvable, leading 
to the narrowband fading model described in the next section. If the delay spread is large then the LOS and all 
multipath components arc typically resolvable into some number of discrete components, leading to the wideband 
fading model of Section 3.3. Note that some of the discrete components in the wideband model arc comprised of 
nonresolvable components. The delay spread is typically measured relative to the received signal component to 
which the demodulator is synchronized. Thus, for the time-invariant channel model of (3.10), if the demodulator 
synchronizes to the LOS signal component, which has the smallest delay to, then the delay spread is a constant 
given by T m = max„ T n — tq. However, if the demodulator synchronizes to a multipath component with delay 
equal to the mean delay r then the delay spread is given by T m = max„ \r n — r |. In time-varying channels the 
multipath delays vary with time, so the delay spread T m becomes a random variable. Moreover, some received 
multipath components have significantly lower power than others, so it’s not clear how the delay associated with 
such components should be used in the characterization of delay spread. In particular, if the power of a multipath 
component is below the noise floor then it should not significantly contribute to the delay spread. These issues 
are typically dealt with by characterizing the delay spread relative to the channel power delay profile, defined in 
Section 3.3.1. Specifically, two common characterizations of channel delay spread, average delay spread and mis 
delay spread, are determined from the power delay profile. Other characterizations of delay spread, such as excees 
delay spread, the delay window, and the delay interval, are sometimes used as well [6, Chapter 5.4.1], [28, Chapter 
6.7.1]. The exact characterization of delay spread is not that important for understanding the general impact of 
delay spread on multipath channels, as long as the characterization roughly measures the delay associated with 
significant multipath components. In our development below any reasonable characterization of delay spread T m 
can be used, although we will typically use the rms delay spread. This is the most common characterization since, 
assuming the demodulator synchronizes to a signal component at the average delay spread, the rms delay spread is 
a good measure of the variation about this average. Channel delay spread is highly dependent on the propagation 
environment. In indoor channels delay spread typically ranges from 10 to 1000 nanoseconds, in suburbs it ranges 
from 200-2000 nanoseconds, and in urban areas it ranges from 1-30 microseconds [6], 



3.2 Narrowband Fading Models 



Suppose the delay spread T m of a channel is small relative to the inverse signal bandwidth B of the transmitted sig- 
nal, i.e. T m « B~ l . As discussed above, the delay spread T m for time-varying channels is usually characterized 
by the rms delay spread, but can also be characterized in other ways. Under most delay spread characterizations 
T m « B ~ 1 implies that the delay associated with the zth multipath component r, < T m Vi, so u(t — t, ) ~ ii(t)ii 
and we can rewrite (3.4) as 



r(t) = 3? 




(3.11) 



Equation (3.11) differs from the original transmitted signal by the complex scale factor in parentheses. This 
scale factor is independent of the transmitted signal s(t) or, equivalently, the baseband signal u(t), as long as the 
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narrowband assumption T m « 1 /B is satisfied. In order to characterize the random scale factor caused by the 
multipath we choose s(t) to be an unmodulated carrier with random phase offset (Po\ 

s(t ) = iR.{e^ 2w ^ ct+ ^} = cos(27r/ c f — (fro), (3.12) 



which is naiTowband for any T m . 

With this assumption the received signal becomes 



r(t) = 3? 





~W) 

Y (*n(t)e - jMt) 


j2nf c t 1 


l 


n = 0 


J 



ri(t) cos 27r/ c f + rQ(t) sin27r/ c f, 



where the in-phase and quadrature components are given by 



(3.13) 



N(t) 

r i{t ) = Y a n (t) cos , (3.14) 

n= 1 



and 



where the phase term 



N(t) 

r Q (t) = Y a n {t) sin (p n (t), 

n=l 

(fruit) = 2tT f c T n [t) — 4>D n ~ (fro 



(3.15) 

(3.16) 



now incorporates the phase offset (fro as well as the effects of delay and Doppler. 

If N(t) is large we can invoke the Central Limit Theorem and the fact that a n (t) and (fr n {t) are stationary and 
ergodic to approximate 77 (f) and rg(f) as jointly Gaussian random processes. The Gaussian property is also true 
for small N if the a n (t) are Rayleigh distributed and the <fr n (t) are uniformly distributed on [— tv , 7 r]. This happens 
when the nth multipath component results from a reflection cluster with a large number of nonresolvable multipath 
components [1]. 



3.2.1 Autocorrelation, Cross Correlation, and Power Spectral Density 

We now derive the autocorrelation and cross correlation of the in-phase and quadrature received signal components 
ry(f) and r<g(i). Our derivations arc based on some key assumptions which generally apply to propagation models 
without a dominant LOS component. Thus, these formulas are not typically valid when a dominant LOS compo- 
nent exists. We assume throughout this section that the amplitude a n (t), multipath delay r n (f) and Doppler fre- 
quency f d u it) are changing slowly enough such that they are constant over the time intervals of interest: a n (t) ~ 
a n , r n (t) ~ r n , and fo n (t) ~ fo n • This will be true when each of the resolvable multipath components is asso- 
ciated with a single reflector. With this assumption the Doppler phase shift is 3 < frD n {t ) = j) 2 tt /'/z ,, dt = 2nfo r t, 
and the phase of the nth multipath component becomes <fr n (t) = 2irf c T n — 2irfo r t — Lq. 

We now make a key assumption: we assume that for the nth multipath component the term 2nf c T n in (p n it) 
changes rapidly relative to all other phase terms in this expression. This is a reasonable assumption since f c is 
large and hence the term 2irf c T n can go through a 360 degree rotation for a small change in multipath delay r n . 
Under this assumption cfr n (t) is uniformly distributed on [— tt , 7r] . Thus 

EMt)]=E£> n. COS (fruit)) — ^ ^ E[a n ]E[cOS (frnjt )] — 0, (3.17) 



3 We assume a Doppler phase shift at t = 0 of zero for simplicity, since this phase offset will not affect the analysis. 
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where the second equality follows from the independence of a n and <j> n and the last equality follows from the 
uniform distribution on <p n . Similarly we can show that E[rg(t)] = 0. Thus, the received signal also has E[r(f)] = 
0, i.e. it is a zero-mean Gaussian process. When there is a dominant LOS component in the channel the phase of 
the received signal is dominated by the phase of the LOS component, which can be determined at the receiver, so 
the assumption of a random uniform phase no longer holds. 

Consider now the autocorrelation of the in-phase and quadrature components. Using the independence of a n 
and <j> n , the independence of (j) n and <p m , n / m, and the uniform distribution of (p n we get that 



E[c(t)rg(f)] 



E 



y, a n cos 4> n (t) y a m sin cp m (t) 

n m 



y y E[a n a m ]E[cos (j) n {t) sin </> m (f)] 

n m 



a n]E[cOS 4> n (t) sin (j) n {t)\ 
n 

0 . 



(3.18) 



Thus, 77 (f) and tq ( t ) arc uncorrelated and, since they are jointly Gaussian processes, this means they arc indepen- 
dent. 

Following a similar derivation as in (3.18) we obtain the autocorrelation of 77 (f) as 

A ri (t,r ) =E[r I (t)r I (t + T)} = y E [q^]E [cos cj> n (t) cos (j) n (t + r) ] . (3.19) 

n 

Now making the substitution (j) n (t) = 27r/ c r n — 2irfr) n t — 4>o and (j) n (t + r) = 27 t f c r n — 2irfD n (t + r) — c/>o we 
get 

E[cos 4> n {t) cos 4> n (t + r)] = .5E[cos 27r/ Dn r] + .5E[cos(47r/ c r n + -Anf Dn t - 2n f Dn r - 2fi 0 )\. (3.20) 

Since 27 t f c r n changes rapidly relative to all other phase terms and is uniformly distributed, the second expectation 
term in (3.20) goes to zero, and thus 

A ri (t,r) = .5y E[a, 2 jE[cos(27r/ Dn r)] = .5 ^E[a 2 n }cos(2nvrcos0 n /X), (3.21) 

n n 

since fo n = ncos 9 n /\ is assumed fixed. Note that A ri (t , r) depends only on r, A ri (t , r) = A ri (r), and thus 
77 (f) is a wide-sense stationary (WSS) random process. 

Using a similar derivation we can show that the quadrature component is also WSS with autocorrelation 
A rQ (r) = A ri (r ) . In addition, the cross correlation between the in-phase and quadrature components depends 
only on the time difference r and is given by 

A r!,r Q {t,T ) = A rit r Q (T ) = E[r/(f)?’Q(f + t)] = ~.5y E[q^] sin(27r»T cos On/ A) = -E[?’g(f)r/(f + r)]. 

n 

(3.22) 

Using these results we can show that the received signal r(t) = 77 (f) cos(27r/ c f) + rQ(t) sin(27r/ c f) is also WSS 
with autocorrelation 



A r (r ) = E[r(f)r(f + r)] = A Ti (t) cos(27 r/ c r) + A TuTq {t) sin(27r f c r). (3.23) 
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In order to further simplify (3.21) and (3.22), we must make additional assumptions about the propagation 
environment. We will focus on the uniform scattering environment introduced by Clarke [4] and further devel- 
oped by Jakes [Chapter 1] [5]. In this model, the channel consists of many scatterers densely packed with respect 
to angle, as shown in Fig. 3.4. Thus, we assume N multipath components with angle of arrival 9 n = nA9, where 
A 9 = 2tt/N. We also assume that each multipath component has the same received power, so E[a^] = 2 P r /N, 
where P r is the total received power. Then (3.21) becomes 

P N 

cos(27rur cos nA9/X). (3.24) 

n= 1 

Now making the substitution N = 2it / A9 yields 

P N 

A r/ (r) = — ^ V cos( 27 tutcos nA9/\)A9. (3.25) 

27T Z — ' 

n = 1 

We now take the limit as the number of scatterers grows to infinity, which corresponds to uniform scattering from 
all directions. Then N — ► oo, A 9 — > 0, and the summation in (3.25) becomes an integral: 

A ti {t) = — - f cos(27h;t cos 9/\)d6 = P r Jq(2tt for), (3.26) 

27 r J 



where 

J 0 (x) = - I* e~ jxcos 9 d9 
n Jo 

is a Bessel function of the 0th order 4 . Similarly, for this uniform scattering environment, 

A ri .r Q ( T ) = 7 ^- J sin(27Tfr cos 9/X)d9 = 0. (3.27) 

A plot of ./o( 27 t/dt) is shown in Figure 3.5. There are several interesting observations from this plot. First 
we see that the autocorrelation is zero for for ~ .4 or, equivalently, for vr ~ ,4A. Thus, the signal decorrelates 
over a distance of approximately one half wavelength, under the uniform 9 n assumption. This approximation is 
commonly used as a rule of thumb to determine many system parameters of interest. For example, we will see 
in Chapter 7 that obtaining independent fading paths can be exploited by antenna diversity to remove some of the 
negative effects of fading. The antenna spacing must be such that each antenna receives an independent fading 
path and therefore, based on our analysis here, an antenna spacing of ,4A should be used. Another interesting 
characteristic of this plot is that the signal recorrelates after it becomes uncorrelated. Thus, we cannot assume that 
the signal remains independent from its initial value at d = 0 for separation distances greater than ,4A. As a result, 
a Markov model is not completely accurate for Rayleigh fading, because of this recorrelation property. However, 
in many system analyses a correlation below .5 does not significantly degrade performance relative to uncorrelated 
fading [8, Chapter 9.6.5]. For such studies the fading process can be modeled as Markov by assuming that once 
the correlation is close to zero, i.e. the separation distance is greater than a half wavelength, the signal remains 
decorrelated at all larger distances. 

4 Note that (3.26) can also be derived by assuming 2twt cos 9„/\ in (3.21) and (3.22) is random with 9 n uniformly distributed, and then 
taking expectation with respect to 9 n . However, based on the underlying physical model, 9 n can only be uniformly distributed in a dense 
scattering environment. So the derivations are equivalent. 
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Figure 3.4: Dense Scattering Environment 



The power spectral densities (PSDs) of 77(f) and rQ(t), denoted by S ri (f ) and S rQ (f), respectively, are 
obtained by taking the Fourier transform of their respective autocorrelation functions relative to the delay parameter 
r. Since these autocorrelation functions arc equal, so are the PSDs. Thus 

SrAf) = Sr Q (f) = HArAr)} = { ^ 7m77^F 1/1 - /d (3.28) 

0 else 



This PSD is shown in Figure 3.6. 

To obtain the PSD of the received signal r(t) under uniform scattering we use (3.23) with A ri>rQ (r) = 0, 
(3.28), and simple properties of the Fourier transform to obtain 



SAJ) = F[AAr)l = .25 [S r ,(/ - /„) + Sr, (l + fc)] 



0 



\f-fc\<fD 

else 



(3.29) 



Note that this PSD integrates to P r , the total received power. 

Since the PSD models the power density associated with multipath components as a function of their Doppler 
frequency, it can be viewed as the distribution (pdf) of the random frequency due to Doppler associated with 
multipath. We see from Figure 3.6 that the PSD S n (f ) goes to infinity at / = ±/d and, consequently, the PSD 
S r (f) goes to infinity at / = ±/ c ± fp. This will not be true in practice, since the uniform scattering model is just 
an approximation, but for environments with dense scatterers the PSD will generally be maximized at frequencies 
close to the maximum Doppler frequency. The intuition for this behavior comes from the nature of the cosine 
function and the fact that under our assumptions the PSD corresponds to the pdf of the random Doppler frequency 
fr_){0). To see this, note that the uniform scattering assumption is based on many scattered paths arriving uniformly 
from all angles with the same average power. Thus, 9 for a randomly selected path can be regarded as a uniform 
random variable on [0, 2ir\. The distribution Pf 9 (f) of the random Doppler frequency f{6) can then be obtained 
from the distribution of 9. By definition, [>/,„( f) is proportional to the density of scatterers at Doppler frequency 
/. Flence, S ri (f ) is also proportional to this density, and we can characterize the PSD from the pdf p j n ( /) . For 
this characterization, in Figure 3.7 we plot fn(9) = fo cos (9) = v/\cos (9) along with a dotted line straight-line 
segment approximation / (6) to fp (9). On the right in this figure we plot the PSD S n (f ) along with a dotted 
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line straight line segment approximation to it S_ r .(f), which corresponds to the Doppler approximation / (0). We 
see that cos($) ~ ±1 for a relatively large range of 0 values. Thus, multipath components with angles of arrival 
in this range of values have Doppler frequency /d($) ~ ± .//.), so the power associated with all of these multipath 
components will add together in the PSD at / « //> This is shown in our approximation by the fact that the 
segments where / (0) = ±/d on the left lead to delta functions at ±f d in the pdf approximation S_ r .(f ) on the 
right. The segments where f D {0) has uniform slope on the left lead to the flat paid of S_ r .(f ) on the right, since 
there is one multipath component contributing power at each angular increment. Formulas for the autocorrelation 
and PSD in nonuniform scattering, corresponding to more typical microcell and indoor environments, can be found 
in [5, Chapter 1], [11, Chapter 2]. 

The PSD is useful in constructing simulations for the fading process. A common method for simulating the 
envelope of a narrowband fading process is to pass two independent white Gaussian noise sources with PSD ./Vo/2 
through lowpass filters with frequency response H(f ) that satisfies 

S ri (f)=S rQ (f) = ^\H(f)\ 2 . (3.30) 

The filter outputs then correspond to the in-phase and quadrature components of the narrowband fading process 
with PSDs S ri {f) and S rQ (f). A similar procedure using discrete filters can be used to generate discrete fading 
processes. Most communication simulation packages (e.g. Matlab, COSSAP) have standard modules that simulate 
narrowband fading based on this method. More details on this simulation method, as well as alternative methods, 
can be found in [1 1, 6 , 7]. 

We have now completed our model for the three characteristics of power versus distance exhibited in narrow- 
band wireless channels. These characteristics are illustrated in Figure 3.8, adding narrowband fading to the path 
loss and shadowing models developed in Chapter 2. In this figure we see the decrease in signal power due to path 
loss decreasing as d 7 with 7 the path loss exponent, the more rapid variations due to shadowing which change on 
the order of the decorrelation distance X c , and the very rapid variations due to multipath fading which change on 
the order of half the signal wavelength. If we blow up a small segment of this figure over distances where path loss 
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Figure 3.6: In-Phase and Quadrature PSD: S ri (f) = S rQ (f) 





Figure 3.7: Cosine and PSD Approximation by Straight Line Segments 



and shadowing arc constant we obtain Figure 3.9, where we show dB fluctuation in received power versus linear 
distance d = vt (not log distance). In this figure the average received power P r is normalized to 0 dBm. A mobile 
receiver traveling at fixed velocity v would experience the received power variations over time illustrated in this 
figure. 

3.2.2 Envelope and Power Distributions 

For any two Gaussian random variables X and Y, both with mean zero and equal variance a 2 , it can be shown 
that Z = \J X' 1 + Y 1 is Rayleigh-distributed and Z 2 is exponentially distributed. We saw above that for <f> n (t) 
uniformly distributed, rj and vq arc both zero-mean Gaussian random variables. If we assume a variance of a 2 for 
both in-phase and quadrature components then the signal envelope 

z {t) = | r (t)| = yj r 2 {t) + r 2 Q {t ) (3.31) 

is Rayleigh-distributed with distribution 

O y y 

Pz{z) = -= r exp[-z 2 /P r \ = A^expf— z 2 /(2ct 2 )], x > 0, (3.32) 

± f (7 
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Shadowing 




Figure 3.8: Combined Path Loss, Shadowing, and Narrowband Fading. 




Figure 3.9: Narrowband Fading. 
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where P r = E[a 2 ] = 2 a 2 is the average received signal power of the signal, i.e. the received power based on 

path loss and shadowing alone. 

We obtain the power distribution by making the change of variables z 2 (t) = |r(f )| 2 in (3.32) to obtain 

p Z 2 (x) = -^e~ x / p r = ^e-^ 2 " 2 ), x > 0. (3.33) 

Thus, the received signal power is exponentially distributed with mean 2<r 2 . The complex lowpass equivalent 
signal for r(t) is given by VLp(t) = rj(t) + jrQ(t) which has phase 9 = arctan(rQ(t)/rj(t)). For 77 (f) and 
vq (f ) uncorrelated Gaussian random variables we can show that 9 is uniformly distributed and independent of 
So r(f) has a Rayleigh-distributed amplitude and uniform phase, and the two arc mutually independent. 



Example 3.2: Consider a channel with Rayleigh fading and average received power P r = 20 dBm. Find the prob- 
ability that the received power is below 10 dBm. 

Solution. We have P r = 20 dBm =100 mW. We want to find the probability that Z 2 < 10 dBm =10 mW. Thus 

HO 1 

p(Z 2 < 10) = / — e~ x/wo dx = .095. 

Jo 100 



If the channel has a fixed LOS component then 77 (f) and rq { t ) arc not zero-mean. In this case the received 
signal equals the superposition of a complex Gaussian component and a LOS component. The signal envelope in 
this case can be shown to have a Rician distribution [9], given by 



Pz(z) = exp 



2\1 



(J- 



-( z 2 + s 2 ) 

2 ^ 



*>(£), z ^ 0 ’ 



(3.34) 



where 2cr 2 = n ^ {) E[a 2 ] is the average power in the non-LOS multipath components and s 2 = Qq is the power 

in the LOS component. The function /q is the modified Bessel function of 0th order. The average received power 
in the Rician fading is given by 

poo 

P r = I z 2 pz(z)dx = s 2 + 2a 2 . (3.35) 

Jo 

The Rician distribution is often described in terms of a fading parameter K, defined by 




(3.36) 



Thus, K is the ratio of the power in the LOS component to the power in the other (non-LOS) multipath components. 
For K = 0 we have Rayleigh fading, and for I\ = 00 we have no fading, i.e. a channel with no multipath and 
only a LOS component. The fading parameter K is therefore a measure of the severity of the fading: a small 
K implies severe fading, a large K implies more mild fading. Making the substitution s 2 = KP/(K + 1) and 
2a 1 = P / (K + 1) we can write the Rician distribution in terms of K and P r as 



2z(I\ + 1 ) 

Pz(z) = exp 



-K - 



(K + 1 )z 2 



Pr 




K(K + 1) 



z>0. 



5 ~ — 



(3.37) 



Both the Rayleigh and Rician distributions can be obtained by using mathematics to capture the underlying 
physical properties of the channel models [1,9]. However, some experimental data does not fit well into either of 
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these distributions. Thus, a more general fading distribution was developed whose parameters can be adjusted to 
fit a variety of empirical measurements. This distribution is called the Nakagami fading distribution, and is given 
by 



Pz(z) 



2 m m z 2m ~ 1 
T(m)P//‘ 



-exp 



—mz 

Pr 



m > .5, 



(3.38) 



where P, is the average received power and T(-) is the Gamma function. The Nakagami distribution is parame- 
terized by P r and the fading parameter m. For m = 1 the distribution in (3.38) reduces to Rayleigh fading. For 
m = ( K + l) 2 /(2 K + 1) the distribution in (3.38) is approximately Rician fading with parameter K. For m = oo 
there is no fading: P r is a constant. Thus, the Nakagami distribution can model Rayleigh and Rician distributions, 
as well as more general ones. Note that some empirical measurements support values of the m parameter less than 
one, in which case the Nakagami fading causes more severe performance degradation than Rayleigh fading. The 
power distribution for Nakagami fading, obtained by a change of variables, is given by 



p z i(x) 




™m— 1 

— r ex P 
f ym) 




(3.39) 



3.2.3 Level Crossing Rate and Average Fade Duration 

The envelope level crossing rate Lz is defined as the expected rate (in crossings per second) at which the signal 
envelope crosses the level Z in the downward direction. Obtaining Lz requires the joint distribution of the signal 
envelope 2 = |rj and its derivative with respect to time z, p(z. z). We now derive Lz based on this joint distribution. 

Consider the fading process shown in Figure 3.10. The expected amount of time the signal envelope spends in 
the interval (Z. Z + dz) with envelope slope in the range [z, z + dz] over time duration dt is A = p(Z, z)dzdzdt. 
The time required to cross from Z to Z A dz once for a given envelope slope i is B = dz/ z. The ratio A/B = 
zp(Z, z)dzdt is the expected number of crossings of the envelope 2 within the interval {Z. Z + dz) for a given 
envelope slope 2 over time duration dt. The expected number of crossings of the envelope level Z for slopes 
between z and z + dz in a time interval [0, T] in the downward direction is thus 

zp(Z, z)dzdt = zp(Z, z)dzT. (3.40) 

So the expected number of crossings of the envelope level Z with negative slope over the interval [0, T] is 

Nz = T I zp(Z,z)dz. (3.41) 

J — OO 

Finally, the expected number of crossings of the envelope level Z per second, i.e. the level crossing rate, is 

L z = ~y~ = j -00 °zp(Z, z)dz. (3.42) 

Note that this is a general result that applies for any random process. 

The joint pdf of 2 and 2 for Rician fading was derived in [9] and can also be found in [1 1]. The level crossing 
rate for Rician fading is then obtained by using this pdf in (3.42), and is given by 

L z = VMK + l)f D pe- K -( K+ V p2 1 0 (2p^K(K + 1)), (3.43) 

where p = Z/yfP~ r . It is easily shown that the rate at which the received signal power crosses a threshold value 70 
obeys the same formula (3.43) with p = \Jzq/ P r . For Rayleigh fading ( K = 0) the level crossing rate simplifies 
to 

L z = Vz/t f d pe~ p2 , (3.44) 
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Figure 3.10: Level Crossing Rate and Fade Duration for Fading Process. 



where p = Z/yfPr. 

We define the average signal fade duration as the average time that the signal envelope stays below a given 
target level Z. This target level is often obtained from the signal amplitude or power level required for a given 
performance metric like bit error rate. Let t , denote the duration of the ith fade below level Z over a time interval 
[0, T\, as illustrated in Figure 3.10. Thus U equals the length of time that the signal envelope stays below Z on its 
ith crossing. Since z(t) is stationary and ergodic, for T sufficiently large we have 

p(z(t) < Z) = ^ ti' ( 3 ‘ 45 ) 



Thus, for T sufficiently large the average fade duration is 



tz = 

Using the Rayleigh distribution for p(z(t) < 



1 

TLz 






i = 1 



Z) yields 



P(z(t) < Z) 
Lz 



tz 



e p2 - 1 
P/dV^tt 



(3.46) 



(3.47) 



with p = Z/y[P r . Note that (3.47) is the average fade duration for the signal envelope (amplitude) level with Z 
the target amplitude and \[Pr the average envelope level. By a change of variables it is easily shown that (3.47) 
also yields the average fade duration for the signal power level with p = yf Pq/P t , where Pq is the target power 
level and P r is the average power level. Note that average fade duration decreases with Doppler, since as a channel 
changes more quickly it remains below a given fade level for a shorter period of time. The average fade duration 
also generally increases with p for p » 1. That is because as the target level increases relative to the average, 
the signal is more likely to be below the target. The average fade duration for Rician fading is more difficult to 
compute, it can be found in [1 1, Chapter 1.4]. 

The average fade duration indicates the number of bits or symbols affected by a deep fade. Specifically, 
consider an uncoded system with bit time T Suppose the probability of bit error is high when z < Z. Then 
if T), ~ tz, the system will likely experience single error events, where bits that are received in error have the 
previous and subsequent bits received correctly (since z > Z for these bits). On the other hand, if Tf, « tz then 
many subsequent bits are received with z < Z, so large bursts of errors are likely. Finally, if Tb » tz the fading 
is averaged out over a bit time in the demodulator, so the fading can be neglected. These issues will be explored in 
more detail in Chapter 8, when we consider coding and interleaving. 
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Example 3.3: 

Consider a voice system with acceptable BER when the received signal power is at or above half its average 
value. If the BER is below its acceptable level for more than 120 ms, users will turn off their phone. Find the range 
of Doppler values in a Rayleigh fading channel such that the average time duration when users have unacceptable 
voice quality is less than t = 60 ms. 



Solution: The target received signal value is half the average, so Pq = .5P r and thus p = x/j). We require 



tz 




< t = .060 



and thus fp > (e — l)/(.060\/27r) = 6.1 Hz. 



3.2.4 Finite State Markov Channels 

The complex mathematical characterization of flat fading described in the previous subsections can be difficult to 
incorporate into wireless performance analysis such as the packet error probability. Therefore, simpler models that 
capture the main features of flat fading channels arc needed for these analytical calculations. One such model is a 
finite state Markov channel (FSMC). In this model fading is approximated as a discrete-time Markov process with 
time discretized to a given interval T (typically the symbol period). Specifically, the set of all possible fading gains 
is modeled as a set of finite channel states. The channel varies over these states at each interval T according to a 
set of Markov transition probabilities. FSMCs have been used to approximate both mathematical and experimental 
fading models, including satellite channels [13], indoor channels [14], Rayleigh fading channels [15, 19], Ricean 
fading channels [20], and Nakagami-m fading channels [17]. They have also been used for system design and 
system performance analysis in [18, 19]. First-order FSMC models have been shown to be deficient in computing 
performance analysis, so higher order models are generally used. The FSMC models for fading typically model 
amplitude variations only, although there has been some work on FSMC models for phase in fading [21] or phase- 
noisy channels [22] . 

A detailed FSMC model for Rayleigh fading was developed in [15]. In this model the time-varying SNR 
associated with the Rayleigh fading, 7, lies in the range 0 < 7 < 00. The FSMC model discretizes this fading 
range into regions so that the yth region Rj is defined as Rj = 7 : Aj < 7 < Aj + 1, where the region boundaries 
{ A :j } and the total number of fade regions are parameters of the model. This model assumes that 7 stays within 
the same region over time interval T and can only transition to the same region or adjacent regions at time T + 1. 
Thus, given that the channel is in state Rj at time T, at the next time interval the channel can only transition to 
Rj- 1, Rj, or Rj+i, a reasonable assumption when f[>T is small. Under this assumption the transition probabilities 
between regions arc derived in [15] as 

Nj+iT s NjT s 

P 3 J +1 = > PjJ - 1 = PJJ = 1 “ PM + 1 “ Pi, 3 - 1» ( 3 - 48 ) 

*3 *j 

where Nj is the level-crossing rate at Aj and 77 is the steady-state distribution corresponding to the yth region: 

77 = p( 7 e Rj) = p(Aj < 7 < A j+ 1). 
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3.3 Wideband Fading Models 



When the signal is not narrowband we get another form of distortion due to the multipath delay spread. In this case 
a short transmitted pulse of duration T will result in a received signal that is of duration T + T m , where T m is the 
multipath delay spread. Thus, the duration of the received signal may be significantly increased. This is illustrated 
in Figure 3.11. In this figure, a pulse of width T is transmitted over a multipath channel. As discussed in Chapter 
5, lineal - modulation consists of a train of pulses where each pulse carries information in its amplitude and/or phase 
corresponding to a data bit or symbol 5 . If the multipath delay spread I'm << T then the multipath components are 
received roughly on top of one another, as shown on the upper right of the figure. The resulting constructive and 
destructive interference causes narrowband fading of the pulse, but there is little time-spreading of the pulse and 
therefore little interference with a subsequently transmitted pulse. On the other hand, if the multipath delay spread 
T m » T, then each of the different multipath components can be resolved, as shown in the lower right of the 
figure. However, these multipath components interfere with subsequently transmitted pulses. This effect is called 
intersymbol interference (IS I). 

There are several techniques to mitigate the distortion due to multipath delay spread, including equalization, 
multicarrier modulation, and spread spectrum, which are discussed in Chapters 11-13. ISI migitation is not nec- 
essary if T » T m , but this can place significant constraints on data rate. Multicarrier modulation and spread 
spectrum actually change the characteristics of the transmitted signal to mostly avoid intersymbol interference, 
however they still experience multipath distortion due to frequency-selective fading, which is described in Section 
3.3.2. 



Pulse 1 Pulse 2 



•i Ea^(x-x £ t)) 



I- T„ t- T t- T *- T 

0 1 2 



Figure 3.11: Multipath Resolution. 

The difference between wideband and narrowband fading models is that as the transmit signal bandwidth B 
increases so that T m ~ B 1 , the approximation u(t — T n (t) ) ~ u(t) is no longer valid. Thus, the received signal 
is a sum of copies of the original signal, where each copy is delayed in time by r n and shifted in phase by (p n (t). 
The signal copies will combine destructively when their phase terms differ significantly, and will distort the direct 
path signal when u(t — r n ) differs from u(t). 

Although the approximation in (3.11) no longer applies when the signal bandwidth is large relative to the 
inverse of the multipath delay spread, if the number of multipath components is large and the phase of each com- 
ponent is uniformly distributed then the received signal will still be a zero-mean complex Gaussian process with 
a Rayleigh-distributed envelope. However, wideband fading differs from narrowband fading in terms of the reso- 
lution of the different multipath components. Specifically, for narrowband signals, the multipath components have 
a time resolution that is less than the inverse of the signal bandwidth, so the multipath components characterized 

5 Linear modulation typically uses nonsquare pulse shapes for bandwidth efficiency, as discussed in Chapter 5.4 



75 






in Equation (3.6) combine at the receiver to yield the original transmitted signal with amplitude and phase char- 
acterized by random processes. These random processes arc characterized by their autocorrelation or PSD, and 
their instantaneous distributions, as discussed in Section 3.2. However, with wideband signals, the received signal 
experiences distortion due to the delay spread of the different multipath components, so the received signal can no 
longer be characterized by just the amplitude and phase random processes. The effect of multipath on wideband 
signals must therefore take into account both the multipath delay spread and the time-variations associated with 
the channel. 

The starting point for characterizing wideband channels is the equivalent lowpass time- varying channel im- 
pulse response c(r, t). Let us first assume that c(r, t) is a continuous 6 deterministic function of r and t. Recall that 
r represents the impulse response associated with a given multipath delay, while t represents time variations. We 
can take the Fourier transform of c(r, t) with respect to t as 

/ OO 

c(T,t)e~ j2npt dt.. (3.49) 

-OO 

We call 5 c (r, p) the deterministic scattering function of the lowpass equivalent channel impulse response c(r, t). 
Since it is the Fourier transform of c(r, t) with respect to the time variation parameter t, the deterministic scattering 
function ,S',.(r. p) captures the Doppler characteristics of the channel via the frequency parameter />. 

In general the time- varying channel impulse response c(r, t) given by (3.6) is random instead of deterministic 
due to the random amplitudes, phases, and delays of the random number of multipath components. In this case we 
must characterize it statistically or via measurements. As long as the number of multipath components is large, 
we can invoke the Central Limit Theorem to assume that c(r, t) is a complex Gaussian process, so its statistical 
characterization is fully known from the mean, autocorrelation, and cross-correlation of its in-phase and quadrature 
components. As in the narrowband case, we assume that the phase of each multipath component is uniformly 
distributed. Thus, the in-phase and quadrature components of c(r, t) arc independent Gaussian processes with the 
same autocorrelation, a mean of zero, and a cross-correlation of zero. The same statistics hold for the in-phase 
and quadrature components if the channel contains only a small number of multipath rays as long as each ray has 
a Rayleigh-distributed amplitude and uniform phase. Note that this model does not hold when the channel has a 
dominant LOS component. 

The statistical characterization of c(r, t) is thus determined by its autocorrelation function, defined as 

A c (n,T 2 -,t, At) = E[c*(Ti;t)c(T 2 ;t + At)]. (3.50) 

Most channels in practice arc wide-sense stationary (WSS), such that the joint statistics of a channel measured 
at two different times t and t + At depends only on the time difference At. For wide-sense stationary channels, 
the autocorrelation of the corresponding bandpass channel h(r, t ) = K{c(r, t)e j27T ^ :t \ can be obtained [16] from 
A c {t\ . t 2 ; t, At) as 7 A^ti, t 2 \ t, At) = .5;R{A c (ti, t 2 \ t, Af)e j27r ^ cAt }. We will assume that our channel model 
is WSS, in which case the autocorrelation becomes indepedent of t: 

A c (ti,t 2 \ At) = E[c*(Ti;t)c(T 2 ;t + At)]. (3.51) 

Moreover, in practice the channel response associated with a given multipath component of delay t\ is uncorrelated 
with the response associated with a multipath component at a different delay t 2 / n, since the two components 
are caused by different scatterers. We say that such a channel has uncorrelated scattering (US). We abbreviate 

6 The wideband channel characterizations in this section can also be done for discrete-time channels that are discrete with respect to r 
by changing integrals to sums and Fourier transforms to discrete Fourier transforms. 

7 It is easily shown that the autocorrelation of the passband channel response h(r,t) is given by E[h(Ti,t)h(T2,t + At)] = 
.55ft{A c (n, T 2 ; t, Af)e 72,r7cAt } + .55 R{A c (ti, T 2 \ t, Ai)e 727rJ ' c( ' 2t+At ' ) }, where A c (n, T 2 \ t, At) = E[c(ti; £)c(t 2 ; t + At)]. Flowever, if 
c(t, t) is WSS then A c (ti, T 2 \ t, At) = 0, so E[h{n,t)h{T 2 ,t + At)] = .55ft{A c (n, T 2 ; t, A t)e l2n:fcA }. 
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channels that are WSS with US as WSSUS channels. The WSSUS channel model was first introduced by Bello 
in his landmark paper [16], where he also developed two-dimensional transform relationships associated with this 
autocorrelation. These relationships will be discussed in Section 3.3.4. Incorporating the US property into (3.51) 
yields 

E[c*(t\] t)c(r 2 ; t + At)] = A c {t\ ; A t)S[n - r 2 ] = A c (t ; At), (3.52) 

where A c (t: At) gives the average output power associated with the channel as a function of the multipath delay 
t = r\ = T2 and the difference At in observation time. This function assumes that n and r 2 satisfy |n — r 2 | > 
B~ 1 , since otherwise the receiver can’t resolve the two components. In this case the two components are modeled 
as a single combined multipath component with delay r rs t\ « t 2 . 

The scattering function for random channels is defined as the Fourier transform of A c (t ; At) with respect to 
the At parameter: 

/ OO 

A c (t, Af)e -j27rpA *<iAf. (3.53) 

-OO 

The scattering function characterizes the average output power associated with the channel as a function of the 
multipath delay r and Doppler p. Note that we use the same notation for the deterministic scattering and random 
scattering functions since the function is uniquely defined depending on whether the channel impulse response is 
deterministic or random. A typical scattering function is shown in Figure 3. 12. 




Figure 3.12: Scattering Function. 

The most important characteristics of the wideband channel, including the power delay profile, coherence 
bandwidth, Doppler power spectrum, and coherence time, are derived from the channel autocorrelation A c (t, At) 
or scattering function S(t. p). These characteristics are described in the subsequent sections. 

3.3.1 Power Delay Profile 

The power delay profile A c (t), also called the multipath intensity profile, is defined as the autocorrelation (3.52) 

with At = 0: A c (t) = A c (t, 0). The power delay profile represents the average power associated with a given 
multipath delay, and is easily measured empirically. The average and rms delay spread are typically defined in 
terms of the power delay profile A c (t ) as 



/o°° rA c (r)d - 

r M^d 
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and 



//o°°( r — ^Tm) 2 A c (r)ch 
GTm V Io°°MT)dT 

Note that if we define the pdf j>T rn of the random delay spread T m in term 

A ( _\ 



PT m (r) = 



_ Mt) 



(3.55) 



(3.56) 



then /it,,, and ar m arc the mean and rms values of T m , respectively, relative to this pdf. Defining the pdf of T m 
by (3.56) or, equivalently, defining the mean and rms delay spread by (3.54) and (3.55), respectively, weights 
the delay associated with a given multipath component by its relative power, so that weak multipath components 
contribute less to delay spread than strong ones. In particular, multipath components below the noise floor will not 
significantly impact these delay spread characterizations. 

The time delay T where A c {t) ~ 0 for r > T can be used to roughly characterize the delay spread of the 
channel, and this value is often taken to be a small integer multiple of the rms delay spread, i.e. A c (t ) ~ 0 for 
r > 3<7T m . With this approximation a linearly modulated signal with symbol period T s experiences significant 
ISI if T s « aT m • Conversely, when T s » OT m the system experiences negligible ISI. For calculations one can 
assume that T s « n Tm implies T s < <Tr m /10 and T s » ar m implies T s > When T s is within an 

order of magnitude of <jT m then there will be some ISI which may or may not significantly degrade performance, 
depending on the specifics of the system and channel. We will study the performance degradation due to ISI in 
linearly modulated systems as well as ISI mitigation methods in later chapters. 

While jiT m ~ <t T m in many channels with a large number of scatterers, the exact relationship between ji r m 
and <r r r m depends on the shape of A c (t). A channel with no LOS component and a small number of multipath 
components with approximately the same large delay will have /j^ m >> ax rn ■ In this case the large value of nr m 
is a misleading metric of delay spread, since in fact all copies of the transmitted signal arrive at rougly the same 
time and the demodulator would synchronize to this common delay. It is typically assumed that the synchronizer 
locks to the multipath component at approximately the mean delay, in which case rms delay spread characterizes 
the time-spreading of the channel. 



Example 3.4: 

The power delay spectrum is often modeled as having a one-sided exponential distribution: 

A c (t) = =2— e _r//Tm , r > 0. 

Show that the average delay spread (3.54) is /iT rn = T m and find the rms delay spread (3.55). 
Solution: It is easily shown that A c {r) integrates to one. The average delay spread is thus given by 




Thus, the average and rms delay spread are the same for exponentially distributed power delay profiles. 
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Example 3.5: 

Consider a wideband channel with multipath intensity profile 



M T ) 



e 1 "/- 00001 0 < r < 20 //sec. 

0 else 



Find the mean and rms delay spreads of the channel and find the maximum symbol rate such that a linearly- 
modulated signal transmitted through this channel does not experience ISI. 



Solution: The average delay spread is 



MT m 



f 10 " 6 re _r /' 00001 (ir 
f 0 20 * 1°" 6 e-r/.00001 dr 



6.87 //sec. 



The rms delay spread is 



a T m = 



\ 



C 10 "(r - nJ^e-rdr 



r20*10- 6 

Jo e ~ 



r dr 



= 5.25 //sec. 



We see in this example that the mean delay spread is roughly equal to its rms value. To avoid ISI we require linear 
modulation to have a symbol period T s that is large relative to <JT m - Taking this to mean that T s > 10<7 T m yields a 
symbol period of T s = 52.5 //sec or a symbol rate of R s = 1/T S = 19.04 Kilosymbols per second. This is a highly 
constrained symbol rate for many wireless systems. Specifically, for binary modulations where the symbol rate 
equals the data rate (bits per second, or bps), high-quality voice requires on the order of 32 Kbps and high-speed 
data requires on the order of 10-100 Mbps. 



3.3.2 Coherence Bandwidth 

We can also characterize the time-varying multipath channel in the frequency domain by taking the Fourier trans- 
form of c(t, t ) with respect to r. Specifically, define the random process 

/ OO 

c(r; t)e~^ 2n ^ T dT. (3.57) 

-OO 

Since c(r; t) is a complex zero-mean Gaussian random variable in t, the Fourier transform above just represents 
the sum 8 of complex zero-mean Gaussian random processes, and therefore C(f; t) is also a zero-mean Gaussian 
random process completely characterized by its autocorrelation. Since c(r; t) is WSS, its integral C(f ; t) is as 
well. Thus, the autocorrelation of (3.57) is given by 

Ac(h,f 2 ;At)=E[C*(f l -t)C(f 2 -,t + At)}. (3.58) 

8 We can express the integral as a limit of a discrete sum. 



79 




We can simplify Ac(fi, f r 2 ', At) as 



Ac(fufr,At) = E 



/ OO POO 

c*(ri; t)e j27r ^ 1T1 (iri / c(r2; t + At)e _j27r ^ 2 ' r2 dr2 

-OO J— OO 

/*00 /‘OO 

/ / T[c*(ti; t)c{r2\ t + Af)]e j2 ^ 1T1 e~ j2 ^ 2 ^dridT2 

J — oo J — OO 

/ OO 

A c (r, At)e~i 27 T (f 2 ~ f 1 ' )T dr. 

-OO 

A c (Af;At) 



(3.59) 



where A/ = f 2 — fi and the third equality follows from the WSS and US properties of c(r;t). Thus, the 
autocorrelation of C(f: t ) in frequency depends only on the frequency difference A/. The function Ac(Af; At) 
can be measured in practice by transmitting a pair of sinusoids through the channel that arc separated in frequency 
by A/ and calculating their cross correlation at the receiver for the time separation At. 

If we define Ac(Af) = Ac(Af; 0) then from (3.59), 

/ OO 

A c {T)e-^ T dT. (3.60) 

-OO 

So Ac(Af) is the Fourier transform of the power delay profile. Since Ac(Af) = E [C*(f; t)C(f + A/; t] is an 
autocorrelation, the channel response is approximately independent at frequency separations A / where A c (Af) « 
0. The frequency B c where Ac(Af) ~ 0 for all A / > B c is called the coherence bandwidth of the channel. By 
the Fourier transform relationship between A c (t ) and Ac(Af), if A c (t) ~ 0 for x >T then Ac(Af) ~ 0 for 
Af > 1/T. Thus, the minimum frequency separation B c for which the channel response is roughly independent 
is B c ~ 1/T, where T is typically taken to be the mis delay spread ax,„ of A c (t). A more general approximation 
is B c ~ k/aTm where k depends on the shape of A c (t) and the precise specification of coherence bandwidth. For 
example. Fee has shown that B c ~ .02 /ax,,, approximates the range of frequencies over which channel correlation 
exceeds 0.9, while B c ~ .2/ a x,„ approximates the range of frequencies over which this correlation exceeds 0.5. 
[ 12 ]. 

In general, if we arc transmitting a narrowband signal with bandwidth B « B c , then fading across the entire 
signal bandwidth is highly correlated, i.e. the fading is roughly equal across the entire signal bandwidth. This is 
usually referred to as flat fading. On the other hand, if the signal bandwidth B » B c , then the channel amplitude 
values at frequencies separated by more than the coherence bandwidth arc roughly independent. Thus, the channel 
amplitude varies widely across the signal bandwidth. In this case the channel is called frequency-selective. When 
B « B c then channel behavior is somewhere between flat and frequency-selective fading. Note that in linear 
modulation the signal bandwidth B is inversely proportional to the symbol time T s , so flat fading corresponds to 
T s ~ 1/B » 1 /B c ~ ax ,„ , i.e. the case where the channel experiences negligible 1ST Frequency-selective 
fading corresponds to T s ~ 1/B « 1/B C = ax rn , i.e. the case where the linearly modulated signal experiences 
significant 1ST Wideband signaling formats that reduce 1ST such as multicarrier modulation and spread spectrum, 
still experience frequency-selective fading across their entire signal bandwidth which causes performance degra- 
dation, as will be discussed in Chapters 12 and 13, respectively. 

We illustrate the power delay profile A c (t) and its Fourier transform Ac(Af) in Figure 3.13. This figure 
also shows two signals superimposed on Ac(Af), a narrowband signal with bandwidth much less than B c and 
a wideband signal with bandwidth much greater than B c . We see that the autocorrelation Ac{Af) is flat across 
the bandwidth of the narrowband signal, so this signal will experience flat fading or, equivalently, negligible 1ST 
The autocorrelation Ac{Af) goes to zero within the bandwidth of the wideband signal, which means that fading 
will be independent across different parts of the signal bandwidth, so fading is frequency selective and a linearly- 
modulated signal transmitted through this channel will experience significant 1ST 
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A( T ) <F A c( Af ) 




Figure 3.13: Power Delay Profile, RMS Delay Spread, and Coherence Bandwidth. 



Example 3.6: In indoor channels a-p m ~ 50 ns whereas in outdoor microcells ar m ~ 30 //sec. Find the maximum 
symbol rate R s = 1/T S for these environments such that a linearly-modulated signal transmitted through these 
environments experiences negligible ISI. 

Solution. We assume that negligible ISI requires T s » ox m , i.e. T s > 10<JT m . This translates to a symbol rate 
R s = 1/T S < .l/oT m - For OT m ~ 50 ns this yields R s < 2 Mbps and for rr Trn « 30/xscc this yields R s < 3.33 
Kbps. Note that indoor systems currently support up to 50 Mbps and outdoor systems up to 200 Kbps. To maintain 
these data rates for a linearly-modulated signal without severe performance degradation due to ISI, some form of 
ISI mitigation is needed. Moreover, ISI is less severe in indoor systems than in outdoor systems due to their lower 
delay spread values, which is why indoor systems tend to have higher data rates than outdoor systems. 



3.3.3 Doppler Power Spectrum and Channel Coherence Time 

The time variations of the channel which arise from transmitter or receiver motion cause a Doppler shift in the 
received signal. This Doppler effect can be characterized by taking the Fourier transform of Ac(Af; At) relative 
to At: 



/ OO 

A c (Af-At)e~^ pAt dAt. (3.61) 

-OO 

In order to characterize Doppler at a single frequency, we set A / to zero and define Sc(p) = Sc( 0; p). It is 
easily seen that 

/ OO 

Ac(At)e~ j27rpAt dAt (3.62) 

-OO 

where Ac (At) = A(<(Af = 0; At). Note that Ac (At) is an autocorrelation function defining how the channel 
impulse response decorrelates over time. In particular Ac(At = T) = 0 indicates that observations of the channel 
impulse response at times separated by T are uncorrelated and therefore independent, since the channel is a Gaus- 
sian random process. We define the channel coherence time T c to be the range of values over which Ac (At) is 
approximately nonzero. Thus, the time-varying channel decorrelates after approximately T c seconds. The func- 
tion Sc(p) is called the Doppler power spectrum of the channel: as the Fourier transform of an autocorrelation 
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Figure 3.14: Doppler Power Spectrum, Doppler Spread, and Coherence Time. 



it gives the PSD of the received signal as a function of Doppler p. The maximum p value for which \Sc(p)\ is 
greater than zero is called the Doppler spread of the channel, and is denoted by Bo- By the Fourier transform 
relationship between Ac(At) and Sc{p), B n ~ 1 /T c . If the transmitter and reflectors are all stationary and the 
receiver is moving with velocity v, then B o < v/X = fo- Recall that in the narrowband fading model samples 
became independent at time At = A/ fo, so in general Bo ~ k/T c where k depends on the shape of S c (p). We 
illustrate the Doppler power spectrum Sc(p ) and its inverse Fourier transform Aq{ A*) in Figure 3.14. 



Example 3.7: 

For a channel with Doppler spread B,/ = 80 Hz, what time separation is required in samples of the received 
signal such that the samples arc approximately independent. 

Solution: The coherence time of the channel is T r ~ l//i,/ = 1/80, so samples spaced 12.5 ms apart arc approx- 
imately uncorrelated and thus, given the Gaussian properties of the underlying random process, these samples arc 
approximately independent. 



3.3.4 Transforms for Autocorrelation and Scattering Functions 

From (3.61) we see that the scattering function S c (t; p) defined in (3.53) is the inverse Fourier transform of 
Sc{Af; p) in the A / variable. Furthermore S c (t: p) and Ac{Af; At) arc related by the double Fourier transform 

/ oo roc> 

/ A c (Af-At)e- j2npAt e j2nrAf dAtdAf. (3.63) 

-oo J — OO 

The relationships among the four functions Ac(Af; At), A c (t; At), ,S’c( A/; p), and S c (t: p) are shown in 
Figure 3.15 

Empirical measurements of the scattering function for a given channel are often used to approximate empiri- 
cally the channel’s delay spread, coherence bandwidth, Doppler spread, and coherence time. The delay spread for 
a channel with empirical scattering function S c (t; p) is obtained by computing the empirical power delay profile 
A c (t) from A c (t, At) = [S c (t; p)] with At = 0 and then computing the mean and rms delay spread from this 
power delay profile. The coherence bandwidth can then be approximated as B c ~ 1 / <7 t,„ ■ Similarly, the Doppler 
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Figure 3.15: Fourier Transform Relationships 



spread Bn is approximated as the range of p values over which 5(0; p) is roughly nonzero, with the coherence 
time T c ~ 1 /Be,. 



3.4 Discrete-Time Model 

Often the time-varying impulse response channel model is too complex for simple analysis. In this case a discrete- 
time approximation for the wideband multipath model can be used. This discrete-time model, developed by Turin 
in [3], is especially useful in the study of spread spectrum systems and RAKE receivers, which is covered in 
Chapter 13. This discrete-time model is based on a physical propagation environment consisting of a composition 
of isolated point scatterers, as shown in Figure 3.16. In this model, the multipath components are assumed to form 
subpath clusters: incoming paths on a given subpath with approximate delay r„ are combined, and incoming paths 
on different subpath clusters with delays r n and r m where \r n — r m \ > 1 / B can be resolved, where B denotes the 
signal bandwidth. 




Figure 3.16: Point Scatterer Channel Model 

The channel model of (3.6) is modified to include a fixed number N + 1 of these subpath clusters as 

N 

c(r;i) = y^a n (f)e~^ w( *^(r - r n (f)). (3.64) 

?i=0 



83 








The statistics of the received signal for a given t are thus given by the statistics of {r re }^, {a n }/) . and {4> n }o ■ 
The model can be further simplified using a discrete time approximation as follows: For a fixed t, the time axis 
is divided into M equal intervals of duration T such that MT > , where ar rn is the rms delay spread of the 

channel, which is derived empirically. The subpaths arc restricted to lie in one of the M time interval bins, as 
shown in Figure 3.17. The multipath spread of this discrete model is MT, and the resolution between paths is 
T. This resolution is based on the transmitted signal bandwidth: T « 1/B. The statistics for the nth bin arc that 
r n , 1 < n < M, is a binary indicator of the existence of a multipath component in the nth bin: so r n is one 
if there is a multipath component in the nth bin and zero otherwise. If r n = 1 then (a n , 9 n ), the amplitude and 
phase corresponding to this multipath component, follow an empirically determined distribution. This distribution 
is obtained by sample averages of (o n , 9 n ) for each n at different locations in the propagation environment. The 
empirical distribution of ( a n ,9 n ) and ( a m ,9 m ), n / rn, is generally different, it may correspond to the same 
family of fading but with different parameters (e.g. Ricean fading with different K factors), or it may correspond 
to different fading distributions altogether (e.g. Rayleigh fading for the nth bin, Nakagami fading for the rath bin). 



fa .0 ) fa , 0 ) fa,0J 

1 1 4 4 6 6 

r r r„ r r r 
1 2 3 4 5 6 

I I I I I I I I I I 
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delay 



Figure 3.17: Discrete Time Approximation 

This completes the statistical model for the discrete time approximation for a single snapshot. A sequence 
of profiles will model the signal over time as the channel impulse response changes, e.g. the impulse response 
seen by a receiver moving at some nonzero velocity through a city. Thus, the model must include both the first 
order statistics of (r n , a n , (P n ) for each profile (equivalently, each t), but also the temporal and spatial correlations 
(assumed Markov) between them. More details on the model and the empirically derived distributions for N and 
for (r n , a n , (pn) can be found in [3]. 



3.5 Space-Time Channel Models 

Multiple antennas at the transmitter and/or receiver arc becoming very common in wireless systems, due to their 
diversity and capacity benefits. Systems with multiple antennas require channel models that characterize both 
spatial (angle of arrival) and temporal characteristics of the channel. A typical model assumes the channel is 
composed of several scattering centers which generate the multipath [23, 24]. The location of the scattering centers 
relative to the receiver dictate the angle of arrival (AOA) of the corresponding multipath components. Models can 
be either two dimensional or three dimensional. 

Consider a two-dimensional multipath environment where the receiver or transmitter has an antenna array 
with M elements. The time-varying impulse response model (3.6) can be extended to incorporate AOA for the 
array as follows. 

N(t) 

c(r, t) = y, a n (t)e~^ n ^a(9 n (t))S(T - r n (f)), (3.65) 

71=0 

where (P n {t) corresponds to the phase shift at the origin of the array and a(9 n (t)) is the array response vector given 
by 

a(9 n (t)) = (3.66) 
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where V’n,* = [%i cos 9 n (t) + yi sin 0 n (t)] 2 tt/\ for ( Xi,y% ) the antenna location relative to the origin and 6 n (t) the 
AOA of the multipath relative to the origin of the antenna array. Assume the AOA is stationary and identically 
distributed for all multipath components and denote this random AOA by 9. Let A(6) denote the average received 
signal power as a function of 9. Then we define the mean and rms angular spread in terms of this power profile as 



J\9A(9)cL9 

m = f’, mm ' 



(3.67) 



and 



a e = 



lj: 7r (9-y e ) 2 A(9)d9 
f- n A(0)d9 ’ 



(3.68) 



We say that two signals received at AOAs separated by l/a$ are roughly uncorrelated. More details on the power 
distribution relative to the AOA for different propagation environments along with the corresponding correlations 
across antenna elements can be found in [24] 

Extending the two dimensional models to three dimensions requires characterizing the elevation AOAs for 
multipath as well as the azimuth angles. Different models for such 3-D channels have been proposed in [25, 26, 27]. 
In [23] the Jakes model is extended to produce spatio-temporal characteristics using the ideas of [25, 26, 27], 
Several other papers on spatio-temporal modeling can be found in [29]. 
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Chapter 3 Problems 

1 . Consider a two-path channel consisting of a direct ray plus a ground-reflected ray where the transmitter is a 
fixed base station at height h and the receiver is mounted on a truck also at height h. The truck starts next to 
the base station and moves away at velocity v. Assume signal attenuation on each path follows a free-space 
path loss model. Find the time-varying channel impulse at the receiver for transmitter-receiver separation 
d = vt sufficiently large such that the length of the reflected path can be approximated by r+r' ~ d+2h? /d. 

2. Find a formula for the multipath delay spread T m for a two-path channel model. Find a simplified formula 
when the transmitter-receiver separation is relatively large. Compute T m for hi = 10m, h r = 4m, and 
d = 100m. 

3. Consider a time-invariant indoor wireless channel with LOS component at delay 23 nsec, a multipath com- 
ponent at delay 48 nsec, and another multipath component at delay 67 nsec. Find the delay spread assuming 
the demodulator synchronizes to the LOS component. Repeat assuming that the demodulator synchronizes 
to the first multipath component. 

4. Show that the minimum value of f c r n for a system at f c = 1 GHz with a fixed transmitter and a receiver 
separated by more than 10 m from the transmitter is much greater than 1. 

5. Prove that for X and Y independent zero-mean Gaussian random variables with variance a 2 , the distribution 
of Z = y X 1 + Y- is Rayleigh-distributed and the distribution of Z 2 is exponentially-distributed. 

6. Assume a Rayleigh fading channel with the average signal power 2<r 2 = —80 dBm. What is the power 
outage probability of this channel relative to the threshold P Q = —95 dBm? How about P D = — 90 dBm? 

7. Assume an application that requires a power outage probability of .01 for the threshold P Q = —80 dBm, For 
Rayleigh fading, what value of the average signal power is required? 

8. Assume a Rician fading channel with 2cr 2 = —80 dBm and a target power of P D = —80 dBm. Find the 
outage probability assuming that the LOS component has average power s 2 = —80 dBm. 

9. This problem illustrates that the tails of the Ricean distribution can be quite different than its Nakagami 
approximation. Plot the CDF of the Ricean distribution for K = 1,5, 10 and the corresponding Nakagami 
distribution with m = (K + l) 2 /(2 K + 1). In general, does the Ricean distribution or its Nakagami 
approximation have a larger outage probability p ( 7 < x) for x large? 

10. In order to improve the performance of cellular systems, multiple base stations can receive the signal trans- 
mitted from a given mobile unit and combine these multiple signals either by selecting the strongest one 
or summing the signals together, perhaps with some optimized weights. This typically increases SNR and 
reduces the effects of shadowing. Combining of signals received from multiple base stations is called macro- 
diversity, and in this problem we explore the benefits of this technique. Diversity will be covered in more 
detail in Chapter 7. 

Consider a mobile at the midpoint between two base stations in a cellular network. The received signals (in 
dBW) from the base stations are given by 



Pr, 1 — H" r + Z 1 , 

Pr,2 = W + Z’2, 

where Z \2 are A?(0, o 2 ) random variables. We define outage with macrodiversity to be the event that both 
P r 1 and P r :2 fall below a threshould T. 




(a) Interpret the terms W. Z\,Z 2 in P r p and P r p. 

(b) If Z\ and Z 2 ai'e independent, show that the outage probability is given by 

Pout = [Q( A/a)] 2 , 

where A = W — T is the fade margin at the mobile’s location. 

(c) Now suppose Z\ and Z 2 are correlated in the following way: 

Z, = aY\ + bY, 



Z 2 = aY 2 + bY , 



where Y, Y \ , Y 2 arc independent A/"(0, cr 2 ) random variables, and a, b are such that a 2 + b 2 = 1. Show 
that 



P+OO 



P Out 



1 



v/2tt . 



Q 



A + bya 



a a 



y2 t 2 dy. 



(d) Compare the outage probabilities of (b) and (c) for the special case of a = b = 1/ \/2, a = 8 and 
A = 5 (this will require a numerical integration). 



1 1 . The goal of this problem is to develop a Rayleigh fading simulator for a mobile communications channel 
using the method based on filtering Gaussian processes based on the in-phase and quadrature PSDs described 
in 3.2.1. In this problem you must do the following: 



(a) Develop simulation code to generate a signal with Rayleigh fading amplitude over time. Your sample 
rate should be at least 1000 samples/sec, the average received envelope should be 1 , and your simulation 
should be parameterized by the Doppler frequency f p. Matlab is the easiest way to generate this 
simulation, but any code is fine. 

(b) Write a description of your simulation that clearly explains how your code generates the fading enve- 
lope using a block diagram and any necessary equations. 

(c) Turn in your well-commented code. 

(d) Provide plots of received amplitude (dB) vs. time for fp = 1, 10, 100 Hz. over 2 seconds. 

12. For a Rayleigh fading channel with average power P r = 30dB, compute the average fade duration for target 
fade values Pq = 0 dB, Pq = 15 dB, and Pq = 30dB. 



13. Derive a formula for the average length of time a Rayleigh fading process with average power P r stays above 
a given target fade value Pq. Evaluate this average length of time for P r = 20 dB, Pq = 25 dB, and fp = 50 

Hz. 



14. Assume a Rayleigh fading channel with average power P r = 10 dB and Doppler fp = 80 Hz. We would like 
to approximate the channel using a finite state Markov model with eight states. The regions Rj corresponds 
to R,\ = 7 : —00 < 7 < — lOdB, R 2 = 7 : — lOdB < 7 < OdB, R3 = 7 : OdB < 7 < 5dB, 
R 4 = 7 : 5dB < 7 < lOdB, R 5 = 7 : lOdB < 7 < 15dB, P 6 = 7 : 15dB < 7 < 20dB, R 7 = 7 : 20dB < 
7 < 30dB, Ip = 7 : 30dB < 7 < 00. Find the transition probabilties between each region for this model. 

15. Consider the following channel scattering function obtained by sending a 900 MHz sinusoidal input into the 
channel: 
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«i(5(t) p = 70Hz. 

S(t, p) = ^ « 2 £(r — .022/isec) p = 49.5Hz. 

0 else 

where a 1 and a-j are determined by path loss, shadowing, and multipath fading. Clearly this scattering 
function corresponds to a 2-ray model. Assume the transmitter and receiver used to send and receive the 
sinusoid arc located 8 meters above the ground. 

(a) Find the distance and velocity between the transmitter and receiver. 

(b) For the distance computed in paid (a), is the path loss as a function of distance proportional to d 2 or 
d -4 ? Hint: use the fact that the channel is based on a 2-ray model. 

(c) Does a 30 KHz voice signal transmitted over this channel experience flat or frequency-selective fading? 

16. Consider a wideband channel characterized by the autocorrelation function 



A c (t, At) = 

where W = 100Hz and sinc(x) = sin(7nc)/(7nc). 



sinc(FFAf) 0 < r < 10/jsec. 
0 else 



(a) Does this channel correspond to an indoor channel or an outdoor channel, and why? 

(b) Sketch the scattering function of this channel. 

(c) Compute the channel’s average delay spread, rms delay spread, and Doppler spread. 

(d) Over approximately what range of data rates will a signal transmitted over this channel exhibit frequency - 
selective fading? 

(e) Would you expect this channel to exhibit Rayleigh or Ricean fading statistics, and why? 

(f) Assuming that the channel exhibits Rayleigh fading, what is the average length of time that the signal 
power is continuously below its average value. 

(g) Assume a system with narrowband binary modulation sent over this channel. Your system has error 
correction coding that can correct two simultaneous bit errors. Assume also that you always make an 
error if the received signal power is below its average value, and never make an error if this power is 
at or above its average value. If the channel is Rayleigh fading then what is the maximum data rate 
that can be sent over this channel with error-free transmission, making the approximation that the fade 
duration never exceeds twice its average value. 

17. Let a scattering function S(t, p) be nonzero over 0 < r < .1 ms and —.1 < p < .1 Hz. Assume that the 
power of the scattering function is approximately uniform over the range where it is nonzero. 

(a) What are the multipath spread and the doppler spread of the channel? 

(b) Suppose you input to this channel two identical sinusoids separated in time by At. What is the mini- 
mum value of A / for which the channel response to the first sinusoid is approximately independent of 
the channel response to the second sinusoid. 

(c) For two sinusoidal inputs to the channel u±(t) = sin27r/f and U 2 {t) = sin27r/(f + At), what is the 
minimum value of At for which the channel response to ui(t) is approximately independent of the 
channel response to U 2 (t). 

(d) Will this channel exhibit flat fading or frequency-selective fading for a typical voice channel with a 3 
KHz bandwidth? How about for a cellular channel with a 30 KHz bandwidth? 
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Chapter 4 

Capacity of Wireless Channels 



The growing demand for wireless communication makes it important to determine the capacity limits of these 
channels. These capacity limits dictate the maximum data rates that can be transmitted over wireless channels 
with asymptotically small error probability, assuming no constraints on delay or complexity of the encoder and 
decoder. Channel capacity was pioneered by Claude Shannon in the late 1940s, using a mathematical theory 
of communication based on the notion of mutual information between the input and output of a channel [1, 2, 
3], Shannon defined capacity as the mutual information maximized over all possible input distributions. The 
significance of this mathematical construct was Shannon’s coding theorem and converse, which proved that a code 
did exist that could achieve a data rate close to capacity with negligible probability of error, and that any data rate 
higher than capacity could not be achieved without an error probability bounded away from zero. Shannon’s ideas 
were quite revolutionary at the time, given the high data rates he predicted were possible on telephone channels and 
the notion that coding could reduce error probability without reducing data rate or causing bandwidth expansion. 
In time sophisticated modulation and coding technology validated Shannon’s theory such that on telephone lines 
today, we achieve data rates very close to Shannon capacity with very low probability of error. These sophisticated 
modulation and coding strategies are treated in Chapters 5 and 8, respectively. 

In this chapter we examine the capacity of a single-user wireless channel where the transmitter and/or receiver 
have a single antenna. Capacity of single-user systems where the transmitter and receiver have multiple antennas 
is treated in Chapter 10 and capacity of multiuser systems is treated in Chapter 14. We will discuss capacity for 
channels that are both time-invariant and time-varying. We first look at the well-known formula for capacity of 
a time-invariant AW GN channel. We next consider capacity of time- varying flat-fading channels. Unlike in the 
AWGN case, capacity of a flat-fading channel is not given by a single formula, since capacity depends on what is 
known about the time-varying channel at the transmitter and/or receiver. Moreover, for different channel informa- 
tion assumptions, there are different definitions of channel capacity, depending on whether capacity characterizes 
the maximum rate averaged over all fading states or the maximum constant rate that can be maintained in all fading 
states (with or without some probability of outage). 

We will consider flat-fading channel capacity where only the fading distribution is known at the transmitter 
and receiver. Capacity under this assumption is typically very difficult to determine, and is only known in a few 
special cases. Next we consider capacity when the channel fade level is known at the receiver only (via receiver 
estimation) or that the channel fade level is known at both the transmitter and the receiver (via receiver estimation 
and transmitter feedback). We will see that the fading channel capacity with channel side information at both the 
transmitter and receiver is achieved when the transmitter adapts its power, data rate, and coding scheme to the 
channel variation. The optimal power allocation in this case is a “water-tilling” in time, where power and data rate 
are increased when channel conditions are favorable and decreased when channel conditions are not favorable. 

We will also treat capacity of frequency-selective fading channels. For time-invariant frequency-selective 
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channels the capacity is known and is achieved with an optimal power allocation that water-fills over frequency in- 
stead of time. The capacity of a time- varying frequency-selective fading channel is unknown in general. However, 
this channel can be approximated as a set of independent parallel flat-fading channels, whose capacity is the sum 
of capacities on each channel with power optimally allocated among the channels. The capacity of this channel is 
known and is obtained with an optimal power allocation that water-fills over both time and frequency. 

We will consider only discrete-time systems in this chapter. Most continuous-time systems can be converted 
to discrete-time systems via sampling, and then the same capacity results hold. However, care must be taken in 
choosing the appropriate sampling rate for this conversion, since time variations in the channel may increase the 
sampling rate required to preserve channel capacity [4]. 

4.1 Capacity in AWGN 

Consider a discrete-time additive white Gaussian noise (AWGN) channel with channel input/output relationship 
y[i\ = x\i ] + n[i\, where x[i] is the channel input at time i, y[i] is the corresponding channel output, and n[i\ is a 
white Gaussian noise random process. Assume a channel bandwidth B and transmit power P. The channel SNR, 
the power in x[i] divided by the power in n[i\, is constant and given by 7 = P/(NqB), where A), is the power 
spectral density of the noise. The capacity of this channel is given by Shannon’s well-known formula [1]: 

C = B log 2 (l+ 7 ), (4.1) 

where the capacity units are bits/second (bps). Shannon’s coding theorem proves that a code exists that achieves 
data rates arbitrarily close to capacity with arbitrarily small probability of bit error. The converse theorem shows 
that any code with rate R > C has a probability of error bounded away from zero. The theorems arc proved using 
the concept of mutual information between the input and output of a channel. For a memoryless time-invariant 
channel with random input x and random output y, the channel’s mutual information is defined as 




where the sum is taken over all possible input and output pairs x £ A” and y £ Y for X and y the input and 
output alphabets. The log function is typically with respect to base 2, in which case the units of mutual in- 
formation arc bits per second. Mutual information can also be written in terms of the entropy in the channel 
output y and conditional output y\x as I(X ; Y) = H(Y ) — H(Y\X), where H(Y) = — Ylye_yP{y) logp(y) and 

p(x, y) log p(y\x). Shannon proved that channel capacity equals the mutual information 
of the channel maximized over all possible input distributions: 

C = max I{X;Y) = max Vp(i, y) log ' ( 4 - 3 ) 

p(*) p(*) \P{x)p{y)J 

For the AWGN channel, the maximizing input distribution is Gaussian, which results in the channel capacity given 
by (4.1). For channels with memory, mutual information and channel capacity arc defined relative to input and 
output sequences x n and y". More details on channel capacity, mutual information, and the coding theorem and 
converse can be found in [2, 5,6]. 

The proofs of the coding theorem and converse place no constraints on the complexity or delay of the com- 
munication system. Therefore, Shannon capacity is generally used as an upper bound on the data rates that can be 
achieved under real system constraints. At the time that Shannon developed his theory of information, data rates 
over standard telephone lines were on the order of 100 bps. Thus, it was believed that Shannon capacity, which 
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predicted speeds of roughly 30 Kbps over the same telephone lines, was not a very useful bound for real systems. 
However, breakthroughs in hardware, modulation, and coding techniques have brought commercial modems of 
today very close to the speeds predicted by Shannon in the 1950s. In fact, modems can exceed this 30 Kbps Shan- 
non limit on some telephone channels, but that is because transmission lines today arc of better quality than in 
Shannon’s day and thus have a higher received power than that used in Shannon’s initial calculation. On AWGN 
radio channels, turbo codes have come within a fraction of a dB of the Shannon capacity limit [7], 

Wireless channels typically exhibit flat or frequency-selective fading. In the next two sections we consider ca- 
pacity of flat-fading and frequency-selective fading channels under different assumptions regarding what is known 
about the channel. 



Example 4.1: Consider a wireless channel where power falloff with distance follows the formula P r (d) = 
Pt(do/d) 3 for do = 10 m. Assume the channel has bandwidth B = 30 KHz and AWGN with noise power 
spectral density of Nq = 10 -9 W/Hz. For a transmit power of 1 W, find the capacity of this channel for a transmit- 
receive distance of 100 m and 1 Km. 

Solution: The received SNR is 7 = P r (d) / (NqB) = .1 3 /(10 -9 x 30 x 10 3 ) = 33 = 15 dB for d = 100 
m and 7 = .01 3 / (10~ 9 x 30 x 10 3 ) = .033 = —15 dB for d = 1000 m. The corresponding capacities are 
C = B log 2 (l + 7 ) = 30000 log 2 (l + 33) = 152.6 Kbps for d = 100 m and C = 30000 log 2 (l + .033) = 1.4 
Kbps for d = 1000 m. Note the significant decrease in capacity at farther distances, due to the path loss exponent 
of 3, which greatly reduces received power as distance increases. 



4.2 Capacity of Flat-Fading Channels 

4.2.1 Channel and System Model 

We assume a discrete-time channel with stationary and ergodic time-varying gain \J g[i], 0 < g[i], and AWGN 
n[i], as shown in Figure 4.1. The channel power gain g[i\ follows a given distribution p(g), e.g. for Rayleigh 
fading p(g) is exponential. We assume that g[i] is independent of the channel input. The channel gain g[i\ can 
change at each time i, either as an i.i.d. process or with some correlation over time. In a block fading channel 
g[i] is constant over some blocklength T after which time g[i] changes to a new independent value based on the 
distribution pig). Let P denote the average transmit signal power, Nq/2 denote the noise power spectral density of 
n[i\, and B denote the received signal bandwidth. The instantaneous received signal-to-noise ratio (SNR) is then 
7 [i] = Pg[i\/(NoB), 0 < 7 [i] < 00 , and its expected value over all time is 7 = Pg/(NoB). Since P/(NqB) is a 
constant, the distribution of g[i\ determines the distribution of 7 [7] and vice versa. 

The system model is also shown in Figure 4.1, where an input message w is sent from the transmitter to the 
receiver. The message is encoded into the codeword x, which is transmitted over the time- varying channel as x[i\ 
at time i. The channel gain g[i], also called the channel side information (CSI), changes during the transmission 
of the codeword. 

The capacity of this channel depends on what is known about g[i\ at the transmitter and receiver. We will 
consider three different scenarios regarding this knowledge: 

1. Channel Distribution Information (CDI): The distribution of g[i\ is known to the transmitter and receiver. 

2. Receiver CSI: The value of g[i\ is known at the receiver at time i, and both the transmitter and receiver 
know the distribution of g[i]. 
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Figure 4.1: Flat-Fading Channel and System Model. 



3. Transmitter and Receiver CSI: The value of g[i] is known at the transmitter and receiver at time i, and 
both the transmitter and receiver know the distribution of g[i\. 

Transmitter and receiver CSI allow the transmitter to adapt both its power and rate to the channel gain at time i, and 
leads to the highest capacity of the three scenarios. Note that since the instantaneous SNR 7 [i] is just g[i] multipled 
by the constant P/(NqB), known CSI or CDI about g[i\ yields the same information about 7 [«]. Capacity for 
time-varying channels under assumptions other than these three are discussed in [ 8 , 9]. 

4.2.2 Channel Distribution Information (CDI) Known 

We first consider the case where the channel gain distribution p(g) or, equivalently, the distribution of SNR p( 7 ) 
is known to the transmitter and receiver. For i.i.d. fading the capacity is given by (4.3), but solving for the 
capacity-achieving input distribution, i.e. the distribution achieving the maximum in (4.3), can be quite compli- 
cated depending on the fading distribution. Moreover, fading correlation introduces channel memory, in which 
case the capacity-achieving input distribution is found by optimizing over input blocks, which makes finding the 
solution even more difficult. For these reasons, finding the capacity-achieving input distribution and corresponding 
capacity of fading channels under CDI remains an open problem for almost all channel distributions. 

The capacity-achieving input distribution and corresponding fading channel capacity under CDI is known for 
two specific models of interest: i.i.d. Rayleigh fading channels and FSMCs. In i.i.d. Rayleigh fading the channel 
power gain is exponential and changes independently with each channel use. The optimal input distribution for this 
channel was shown in [10] to be discrete with a finite number of mass points, one of which is located at zero. This 
optimal distribution and its corresponding capacity must be found numerically. The lack of closed-form solutions 
for capacity or the optimal input distribution is somewhat surprising given the fact that the fading follows the 
most common fading distribution and has no correlation structure. For flat-fading channels that are not necessarily 
Rayleigh or i.i.d. upper and lower bounds on capacity have been determined in [1 1], and these bounds arc tight at 
high SNRs. 

FSMCs to approximate Rayleigh fading channels was discussed in Chapter 3.2.4. This model approximates 
the fading correlation as a Markov process. While the Markov nature of the fading dictates that the fading at a 
given time depends only on fading at the previous time sample, it turns out that the receiver must decode all past 
channel outputs jointly with the current output for optimal (i.e. capacity-achieving) decoding. This significantly 
complicates capacity analysis. The capacity of FSMCs has been derived for i.i.d. inputs in [13, 14] and for general 
inputs in [15]. Capacity of the FSMC depends on the limiting distribution of the channel conditioned on all past 
inputs and outputs, which can be computed recursively. As with the i.i.d. Rayleigh fading channel, the complexity 
of the capacity analysis along with the final result for this relatively simple fading model is very high, indicating 
the difficulty of obtaining the capacity and related design insights into channels when only CDI is available. 
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4.2.3 Channel Side Information at Receiver 



We now consider the case where the CSI g[i\ is known at the receiver at time i. Equivalently, 7 [i] is known at the 
receiver at time i. We also assume that both the transmitter and receiver know the distribution of g[i\. In this case 
there are two channel capacity definitions that are relevant to system design: Shannon capacity, also called ergodic 
capacity, and capacity with outage. As for the AWGN channel. Shannon capacity defines the maximum data rate 
that can be sent over the channel with asymptotically small error probability. Note that for Shannon capacity the 
rate transmitted over the channel is constant: the transmitter cannot adapt its transmission strategy relative to the 
CSI. Thus, poor channel states typically reduce Shannon capacity since the transmission strategy must incorporate 
the effect of these poor states. An alternate capacity definition for fading channels with receiver CSI is capacity 
with outage. Capacity with outage is defined as the maximum rate that can be transmitted over a channel with 
some outage probability corresponding to the probability that the transmission cannot be decoded with negligible 
error probability. The basic premise of capacity with outage is that a high data rate can be sent over the channel 
and decoded correctly except when the channel is in deep fading. By allowing the system to lose some data in the 
event of deep fades, a higher data rate can be maintained than if all data must be received correctly regardless of 
the fading state, as is the case for Shannon capacity. The probability of outage characterizes the probability of data 
loss or, equivalently, of deep fading. 



Shannon (Ergodic) Capacity 



Shannon capacity of a fading channel with receiver CSI for an average power constraint P can be obtained from 
results in [16] as 

r oo 



C= Slog 2 (l + 7 )p( 7 )d 7 . 



(4.4) 



■Jo 

Note that this formula is a probabilistic average, i.e. Shannon capacity is equal to Shannon capacity for an AWGN 
channel with SNR 7, given by log 2 (l + 7), averaged over the distribution of 7. That is why Shannon capacity 
is also called ergodic capacity. However, care must be taken in interpreting (4.4) as an average. In particular, it is 
incorrect to interpret (4.4) to mean that this average capacity is achieved by maintaining a capacity B log 2 (l + 7) 
when the instantaneous SNR is 7, since only the receiver knows the instantaneous SNR y[i], and therefore the 
data rate transmitted over the channel is constant, regardless of 7. Note, also, the capacity-achieving code must be 
sufficiently long so that a received codeword is affected by all possible fading states. This can result in significant 
delay. 

By Jensen’s inequality. 



E[Hlog 2 (l + 7)] = j Hlog 2 (l + 7 M 7 M 7 < Blog 2 (l + E[ 7 ]) = log 2 (l + 7), (4.5) 



where 7 is the average SNR on the channel. Thus we see that the Shannon capacity of a fading channel with 
receiver CSI only is less than the Shannon capacity of an AWGN channel with the same average SNR. In other 
words, fading reduces Shannon capacity when only the receiver has CSI. Moreover, without transmitter CSI, the 
code design must incorporate the channel correlation statistics, and the complexity of the maximum likelihood 
decoder will be proportional to the channel decorrelation time. In addition, if the receiver CSI is not perfect, 
capacity can be significantly decreased [20]. 



Example 4.2: Consider a flat-fading channel with i.i.d. channel gain g[i] which can take on three possible values: 
gi = .05 with probability p\ = .1, g-> = .5 with probability 72 = .5, and 7.3 = 1 with probability 73 = .4. 
The transmit power is 10 mW, the noise spectral density is Nq = 10 -9 W/Hz, and the channel bandwidth is 30 
KHz. Assume the receiver has knowledge of the instantaneous value of g[i] but the transmitter does not. Find the 
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Shannon capacity of this channel and compare with the capacity of an AW GN channel with the same average SNR. 

Solution: The channel has 3 possible received SNRs, 71 = Ptgi/ (N qB) = ,01*(.05 2 )/(30000*10~ 9 ) = .8333 = 
-.79 dB, 72 = P t g 2 /(N 0 B ) = .01 x (,5 2 )/(30000 * 10“ 9 ) = 83.333 = 19.2 dB, and 73 = P t g 3 /(N 0 B) = 
.01/(30000 * 10 -9 ) = 333.33 = 25 dB. The probabilities associated with each of these SNR values is £>(71) = .1, 
p( 72) = .5, and p( 73) = .4. Thus, the Shannon capacity is given by 

C = ££log 2 (l + 7i)p(7i) = 30000(.llog 2 (1.8333) + .5 log 2 (84.333) + .4 log 2 (334.33)) = 199.26 Kbps. 

i 

The average SNR for this channel is 7 = .1(.8333) + .5(83.33) + .4(333.33) = 175.08 = 22.43 dB. The 
capacity of an AWGN channel with this SNR is C = Hlog 2 (l + 175.08) = 223.8 Kbps. Note that this 
rate is about 25 Kbps larger than that of the flat-fading channel with receiver CSI and the same average SNR. 



Capacity with Outage 

Capacity with outage applies to slowly-varying channels, where the instantaneous SNR 7 is constant over a large 
number of transmissions (a transmission burst) and then changes to a new value based on the fading distribution. 
With this model, if the channel has received SNR 7 during a burst then data can be sent over the channel at rate 
B log 2 (l + 7) with negligible probability of error 1 . Since the transmitter does not know the SNR value 7, it must 
fix a transmission rate independent of the instantaneous received SNR. 

Capacity with outage allows bits sent over a given transmission burst to be decoded at the end of the burst 
with some probability that these bits will be decoded incorrectly. Specifically, the transmitter fixes a minimum 
received SNR 7 m i n and encodes for a data rate C = B log 2 (l + 7 min)- The data is correctly received if the 
instantaneous received SNR is greater than or equal to 7 m in [17, 18]. If the received SNR is below 7 m in th en the 
bits received over that transmission burst cannot be decoded correctly with probability approaching one, and the 
receiver declares an outage. The probability of outage is thus p out = p ( 7 < 7 min)- The average rate correctly 
received over many transmission bursts is C 0 = (1 — p ou t)B log 2 (l + 7 min) since data is only correctly received 
on 1 — p out transmissions. The value of 7 m i n is a design parameter based on the acceptable outage probability. 
Capacity with outage is typically characterized by a plot of capacity versus outage, as shown in Figure 4.2. In 
this figure we plot the normalized capacity C / B = log 2 (l + 7 min) as a function of outage probability p cm i = 
p ( 7 < 7min) for a Rayleigh fading channel (7 exponential) with 7 = 20 dB. We see that capacity approaches zero 
for small outage probability, due to the requirement to correctly decode bits transmitted under severe fading, and 
increases dramatically as outage probability increases. Note, however, that these high capacity values for large 
outage probabilities have higher probability of incorrect data reception. The average rate correctly received can be 
maximized by finding the 7 m in or, equivalently, the p ou t , that maximizes C„. 



Example 4.3: Assume the same channel as in the previous example, with a bandwidth of 30 KHz and three possi- 
ble received SNRs: 71 = .8333 with £>(71) = .1, 72 = 83.33 with £>(72) = .5, and 73 = 333.33 with £>(73) = .4. 
Find the capacity versus outage for this channel, and find the average rate correctly received for outage probabili- 
ties Pout ^ *1, Pout — T and Pout — -6. 



'The assumption of constant fading over a large number of transmissions is needed since codes that achieve capacity require very large 
blocklengths. 
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Figure 4.2: Normalized Capacity (C / B) versus Outage Probability. 



Solution: For time-varying channels with discrete SNR values the capacity versus outage is a staircase func- 
tion. Specifically, for p out < .1 we must decode correctly in all channel states. The minimum received SNR 
for pout, in this range of values is that of the weakest channel: y„ Mn = 71, and the corresponding capacity 
is C = B log 2 ( 1 + 7mtn) = 30000 log 2 (1 .833) = 26.23 Kbps. For .1 < p out , < .6 we can decode in- 
correctly when the channel is in the weakest state only. Then 7 ?mn = 72 and the corresponding capacity is 
C = B log 2 (l + 7 min) = 300001og 2 (84.33) = 191.94 Kbps. For .6 < p ou t < 1 we can decode incorrectly if the 
channel has received SNR 71 or 72. Then 7 m , n = 73 and the corresponding capacity is C = B log 2 (l + 7 m in) = 
30000 log 2 (334. 33) = 251.55 Kbps. Thus, capacity versus outage has C = 26.23 Kbps for p ou t < .1 ,C= 191.94 
Kbps for .1 < p out , < .6, and C = 251.55 Kbps for .6 < p ou t < 1. 

For pout < -1 data transmitted at rates close to capacity C = 26.23 Kbps are always correctly received since 
the channel can always support this data rate. For p ou t = -1 we transmit at rates close to C = 191.94 Kbps, 
but we can only correctly decode these data when the channel SNR is 72 or 73, so the rate correctly received is 
(1 — .1)191940 = 172.75 Kbps. For p ou t = -6 we transmit at rates close to C = 251.55 Kbps but we can only 
correctly decode these data when the channel SNR is 73, so the rate correctly received is (1 — .6)251550 = 125.78 
Kbps. It is likely that a good engineering design for this channel would send data at a rate close to 191.94 Kbps, 
since it would only be received incorrectly at most 10% of this time and the data rate would be almost an order 
of magnitude higher than sending at a rate commensurate with the worst-case channel capacity. However, 10% 
retransmission probability is too high for some applications, in which case the system would be designed for the 
26.23 Kbps data rate with no retransmissions. Design issues regarding acceptable retransmission probability will 
be discussed in Chapter 14. 
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4.2.4 Channel Side Information at Transmitter and Receiver 

When both the transmitter and receiver have CSI, the transmit ter can adapt its transmission strategy relative to this 
CSI, as shown in Figure 4.3. In this case there is no notion of capacity versus outage where the transmitter sends 
bits that cannot be decoded, since the transmitter knows the channel and thus will not send bits unless they can be 
decoded correctly. In this section we will derive Shannon capacity assuming optimal power and rate adaptation 
relative to the CSI, as well as introduce alternate capacity definitions and their power and rate adaptation strategies. 



TRANSMITTER CHANNEL RECEIVER 




Figure 4.3: System Model with Transmitter and Receiver CSI. 



Shannon Capacity 

We now consider the Shannon capacity when the channel power gain g[i] is known to both the transmitter and 
receiver at time i. The Shannon capacity of a time-varying channel with side information about the channel state 
at both the transmitter and receiver was originally considered by Wolfowitz for the following model. Let s[i] be 
a stationary and ergodic stochastic process representing the channel state, which takes values on a finite set S of 
discrete memoryless channels. Let C s denote the capacity of a particular channel s G <S, and p(s) denote the 
probability, or fraction of time, that the channel is in state s. The capacity of this time-varying channel is then 
given by Theorem 4.6.1 of [19]: 



C = Y^C s p(s). (4.6) 

We now apply this formula to the system model in Figure 4.1. We know the capacity of an AWGN channel 
with average received SNR 7 is C 7 = B log 2 (l + 7). Let p( 7) = p(p{i\ = 7) denote the probability distribution 
of the received SNR. From (4.6) the capacity of the fading channel with transmitter and receiver side information 
is thus 2 

/*oo roo 

C= C 1 p(i)d^= / Blog 2 (l + 'y)p('y)d'j. (4.7) 

Jo Jo 

We see that without power adaptation, (4.4) and (4.7) are the same, so transmitter side information does not increase 
capacity unless power is also adapted. 

Let us now allow the transmit power P( 7) to vary with 7, subject to an average power constraint P: 

P{l)p{l)dl < p ■ (4. 8) 

With this additional constraint, we cannot apply (4.7) directly to obtain the capacity. However, we expect that the 
capacity with this average power constraint will be the average capacity given by (4.7) with the power optimally 

2 Wolfowitz’s result was for 7 ranging over a finite set, but it can be extended to infinite sets [21]. 
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Figure 4.4: Multiplexed Coding and Decoding. 



distributed over time. This motivates defining the fading channel capacity with average power constraint (4.8) as 



C = max 

p (7):/ P{n)p{l)dl=P 



[Blo g2 (l + 



It is proved in [21] that the capacity given in (4.9) can be achieved, and any rate larger than this capacity has 
probability of error bounded away from zero. The main idea behind the proof is a “time diversity” system with 
multiplexed input and demultiplexed output, as shown in Figure 4.4. Specifically, we first quantize the range of 
fading values to a finite set { 7 j : 1 < j < N\. For each yy, we design an encoder/decoder pair for an AWGN 
channel with SNR 7 j. The input x :j for encoder 7 j has average power P{^j) and data rate Rj = Cj , where Cj 
is the capacity of a time-invariant AWGN channel with received SNR P(jj)x.j / P. These encoder/decoder pairs 
correspond to a set of input and output ports associated with each 7 j. When 7 [i] ~ yy. the corresponding pair of 
ports are connected through the channel. The codewords associated with each 7 y are thus multiplexed together for 
transmission, and demultiplexed at the channel output. This effectively reduces the time-varying channel to a set 
of time-invariant channels in parallel, where the y th channel only operates when 7 [i] ~ 7 y. The average rate on the 
channel is just the sum of rates associated with each of the 7 y channels weighted by p(yy ) , the percentage of time 
that the channel SNR equals 7 y. This yields the average capacity formula (4.9). 

To find the optimal power allocation / J (yj, we form the Lagrangian 

J {P{l )) = ^ B log 2 ^ p{l)d^ - A j P(7)p(7)d7. (4.10) 



Next we differentiate the Lagrangian and set the derivative equal to zero: 

a/(P( 7 )) \( B/ ln(2) \ 7 1 



= = — A p( 7 ) = 0. 



(4.11) 



dp( 7) [\i+iPh)/pjp r w/ 

Solving for P( 7 ) with the constraint that P( 7 ) > 0 yields the optimal power adaptation that maximizes (4.9) as 



m = i i--, 

p 1 0 



7 > 7o 
7 < 70 



(4.12) 



for some “cutoff” value 70 . If 7 [i] is below this cutoff then no data is transmitted over the /'th time interval, so the 
channel is only used at time i if 70 < 7 [*] < 00 . Substituting (4.12) into (4.9) then yields the capacity formula: 



Blog2 Vto/ P ^ d ' y ' 



(4.13) 









Figure 4.5: Optimal Power Allocation: Water-Filling. 



The multiplexing nature of the capacity-achieving coding strategy indicates that (4.13) is achieved with a time- 
varying data rate, where the rate corresponding to instantaneous SNR 7 is B log 2 (7/7o)- Since 70 is constant, 
this means that as the instantaneous SNR increases, the data rate sent over the channel for that instantaneous SNR 
also increases. Note that this multiplexing strategy is not the only way to achieve capacity (4.13): it can also be 
achieved by adapting the transmit power and sending at a fixed rate [22], We will see in Section 4.2.6 that for 
Rayleigh fading this capacity can exceed that of an AWGN channel with the same average power, in contrast to 
the case of receiver CSI only, where fading always decreases capacity. 

Note that the optimal power allocation policy (4.12) only depends on the fading distribution p(y) through 
the cutoff value 70. This cutoff value is found from the power constraint. Specifically, by rearranging the power 
constraint (4.8) and replacing the inequality with equality (since using the maximum available power will always 
be optimal) yields the power constraint 



-=p-p{l)dl = 1 - 



(4.14) 



Now substituting the optimal power adaptation (4.12) into this expression yields that the cutoff value 70 must 
satisfy 




^ p(7)d7 = 1. 



(4.15) 



Note that this expression only depends on the distribution p( 7) . The value for 70 cannot be solved for in closed 
form for typical continuous pdfs p( 7) and thus must be found numerically [23]. 

Since 7 is time-varying, the maximizing power adaptation policy of (4. 12) is a “water- tilling” formula in time, 
as illustrated in Figure 4.5. This curve shows how much power is allocated to the channel for instantaneous SNR 
7(f) = 7. The water-tilling terminology refers to the fact that the line I/7 sketches out the bottom of a bowl, 
and power is poured into the bowl to a constant water level of I/70. The amount of power allocated for a given 7 
equals I/70 — I/7, the amount of water between the bottom of the bowl (I/7) and the constant water line (1 / 70)- 
The intuition behind water-tilling is to take advantage of good channel conditions: when channel conditions arc 
good (7 large) more power and a higher data rate is sent over the channel. As channel quality degrades (7 small) 
less power and rate are sent over the channel. If the instantaneous channel SNR falls below the cutoff value, the 
channel is not used. Adaptive modulation and coding techniques that follow this same principle were developed in 
[24, 25] and are discussed in Chapter 9. 
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Note that the multiplexing argument sketching how capacity (4.9) is achieved applies to any power adaptation 
policy, i.e. for any power adaptation policy P( 7) with average power P the capacity 

C = J mo g2 (1 + ^) P(7)d7. (4-16) 

can be achieved with arbitrarily small error probability. Of course this capacity cannot exceed (4.9), where power 
adaptation is optimized to maximize capacity. However, there arc scenarios where a suboptimal power adapta- 
tion policy might have desirable properties that outweigh capacity maximization. In the next two sections we 
discuss two such suboptimal policies, which result in constant data rate systems, in contrast to the variable-rate 
transmission policy that achieves the capacity in (4.9). 



Example 4.4: Assume the same channel as in the previous example, with a bandwidth of 30 KHz and three possi- 
ble received SNRs: 71 = .8333 with £>(71) = .1, 72 = 83.33 with £>(72) = .5, and 73 = 333.33 with £>(73) = .4. 
Find the ergodic capacity of this channel assuming both transmitter and receiver have instantaneous CSI. 



Solution: We know the optimal power allocation is water-filling, and we need to find the cutoff value 70 that 
satisfies the discrete version of (4. 15) given by 




P(w) = !• 



(4.17) 



We first assume that all channel states arc used to obtain 70, i.e. assume 70 < min,; 7 and see if the resulting cutoff 
value is below that of the weakest channel. If not then we have an inconsistency, and must redo the calculation 
assuming at least one of the channel states is not used. Applying (4.17) to our channel model yields 



E 



i = 1 



pirn) 

7o 



E 

i = 1 



p(n. 

Hi 



1 

= 1 =7 — = 
7o 



3 

i+E 



i = 1 



phi) 

7 i 



1 + 



.1 .5 .4 \ 

.8333 + 83.33 + 333.33 ) 



1.13 



Solving for 70 yields 70 = 1/1.13 = .89 > .8333 = 71. Since this value of 70 is greater than the SNR in the 
weakest channel, it is inconsistent as the channel should only be used for SNRs above the cutoff value. Therefore, 
we now redo the calculation assuming that the weakest state is not used. Then (4.17) becomes 



3 



E 



Phi) 

7o 



E 

i = 2 



phi) =1=> ^_ = 1 | s^phi 
7* 70 “ 7» 



1 + 



.5 .4 \ 

83.33 + 333.33 ) 



1.0072 



Solving for 70 yields 70 = .89. So by assuming the weakest channel with SNR 71 is not used, we obtain a 
consistent value for 70 with 71 < 70 < 72- The capacity of the channel then becomes 



3 

C = J2 Bl °S2hiho)phi) = 30000(.51og 2 (83.33/.89) + .41og 2 (333.33/.89)) = 200.82 Kbps. 
1=2 



Comparing with the results of the previous example we see that this rate is only slightly higher than for the case 
of receiver CSI only, and is still significantly below that of an AWGN channel with the same average SNR. That 
is because the average SNR for this channel is relatively high: for low SNR channels capacity in flat-fading can 
exceed that of the AWGN channel with the same SNR by taking advantage of the rare times when the channel is 
in a very good state. 
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Zero-Outage Capacity and Channel Inversion 



We now consider a suboptimal transmitter adaptation scheme where the transmitter uses the CSI to maintain a 
constant received power, i.e., it inverts the channel fading. The channel then appeal's to the encoder and decoder as 
a time-invariant AWGN channel. This power adaptation, called channel inversion, is given by P(p/)/P = <7/7, 
where <7 equals the constant received SNR that can be maintained with the transmit power constraint (4.8). The 
constant a thus satisfies f ^71(7)1(7 = 1, so cr = l/E[l/7], 

Fading channel capacity with channel inversion is just the capacity of an AWGN channel with SNR a: 



C = B log 2 [1 + a] 



B log 2 



1 

. + FTtL ' 



(4.18) 



The capacity-achieving transmission strategy for this capacity uses a fixed-rate encoder and decoder designed for 
an AWGN channel with SNR cr. This has the advantage of maintaining a fixed data rate over the channel regardless 
of channel conditions. For this reason the channel capacity given in (4.18) is called zero-outage capacity, since 
the data rate is fixed under all channel conditions and there is no channel outage. Note that there exist practical 
coding techniques that achieve near-capacity data rates on AWGN channels, so the zero-outage capacity can be 
approximately achieved in practice. 

Zero-outage capacity can exhibit a large data rate reduction relative to Shannon capacity in extreme fading 
environments. For example, in Rayleigh fading Efl/y] is infinite, and thus the zero-outage capacity given by (4.18) 
is zero. Channel inversion is common in spread spectrum systems with near-far interference imbalances [26]. It 
is also the simplest scheme to implement, since the encoder and decoder are designed for an AWGN channel, 
independent of the fading statistics. 



Example 4.5: Assume the same channel as in the previous example, with a bandwidth of 30 KHz and three possi- 
ble received SNRs: 71 = .8333 with 73(71) = .1, 72 = 83.33 with 74(72) = .5, and 73 = 333.33 with 77(73) = .4. 
Assuming transmitter and receiver CSI, find the zero-outage capacity of this channel. 

Solution: The zero-outage capacity is C 

E[1 h\ 

we have C = 30000 log 2 (l + 1/.1272) = 
optimal water-tilling adaptation. 



= B log 2 [l + cr], where a = l/Efl/y]. Since 
.1 .5 .4 



+ 



+ 



= .1272, 



.8333 83.33 333.33 

9443 Kbps. Note that this is less than half of the Shannon capacity with 



Outage Capacity and Truncated Channel Inversion 



The reason zero-outage capacity may be significantly smaller than Shannon capacity on a fading channel is the 
requirement to maintain a constant data rate in all fading states. By suspending transmission in particularly bad 
fading states (outage channel states), we can maintain a higher constant data rate in the other states and thereby 
significantly increase capacity. The outage capacity is defined as the maximum data rate that can be maintained 
in all nonoutage channel states times the probability of nonoutage. Outage capacity is achieved with a truncated 
channel inversion policy for power adaptation that only compensates for fading above a certain cutoff fade depth 



7o : 



-P(7) = f 7 T > 7o 
P { 0 7 < 70 



(4.19) 
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where 70 is based on the outage probability: p ou t = p( 7 < 70 ). Since the channel is only used when 7 > 70 , the 
power constraint (4.8) yields a = 1 /E ^ 0 [I/ 7 ] , where 

E-ydIVt] = [ ~pH)H- (4-20) 

J 70 T 

The outage capacity associated with a given outage probability p out and corresponding cutoff 70 is given by 

C(pout ) = B log 2 ^1 + E > 7o) • (4.21) 

We can also obtain the maximum outage capacity by maximizing outage capacity over all possible 70 : 

C = max B log 2 ( 1 + } . , ) p ( 7 > 70 ) • (4-22) 

70 V EtoUh]/ 

This maximum outage capacity will still be less than Shannon capacity (4.13) since truncated channel inversion 
is a suboptimal transmission strategy. However, the transmit and receive strategies associated with inversion or 
truncated inversion may be easier to implement or have lower complexity than the water-filling schemes associated 
with Shannon capacity. 



Example 4.6: Assume the same channel as in the previous example, with a bandwidth of 30 KHz and three possible 
received SNRs: 71 = .8333 with 71(71) = .1, 72 = 83.33 with £>(72) = .5, and 73 = 333.33 with 7X73) = .4. Find 
the outage capacity of this channel and associated outage probabilities for cutoff values 70 = .84 and 70 = 83.4. 
Which of these cutoff values yields a larger outage capacity? 

Solution: For 70 = .84 we use the channel when the SNR is 72 or 73 , so E 70 [ 1 / 7 ] = Yli =2 Pili) Hi = -5/83.33 + 
.4/333.33 = .0072. The outage capacity is C = Hlog 2 (l + l/E 7 o [l/ 7 ])p (7 > 70 ) = 30000 log 2 (l + 138.88) * 
.9 = 192.457. For 70 = 83.34 we use the channel when the SNR is 73 only, so E 70 [I/ 7 ] = p( 73)/73 = 
.4/333.33 = .0012. The capacity is C = B log 2 (l + l/E 70 [l/ 7 ])p (7 > 7o ) = 30000 log 2 (l + 833.33) * .4 = 
116.45 Kbps. The outage capacity is larger when the channel is used for SNRs 72 and 73 . Even though the SNR 
73 is significantly larger than 72 , the fact that this SNR only occurs 40% of the time makes it inefficient to only use 
the channel in this best state. 



4.2.5 Capacity with Receiver Diversity 

Receiver diversity is a well-known technique to improve the performance of wireless communications in fading 
channels. The main advantage of receiver diversity is that it mitigates the fluctuations due to fading so that the 
channel appeal's more like an AWGN channel. More details on receiver diversity and its performance will be 
given in Chapter 7. Since receiver diversity mitigates the impact of fading, an interesting question is whether it 
also increases the capacity of a fading channel. The capacity calculation under diversity combining first requires 
that the distribution of the received SNR 71 ( 7 ) under the given diversity combining technique be obtained. Once 
this distribution is known it can be substituted into any of the capacity formulas above to obtain the capacity 
under diversity combining. The specific capacity formula used depends on the assumptions about channel side 
information, e.g. for the case of perfect transmitter and receiver CSI the formula (4.13) would be used. Capacity 
under both maximal ratio and selection combining diversity for these different capacity formulas was computed 
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in [23]. It was found that, as expected, the capacity with perfect transmitter and receiver CSI is bigger than with 
receiver CSI only, which in turn is bigger than with channel inversion. The performance gap of these different 
formulas decreases as the number of antenna branches increases. This trend is expected, since a large number of 
antenna branches makes the channel look like AW GN, for which all of the different capacity formulas have roughly 
the same performance. 

Recently there has been much research activity on systems with multiple antennas at both the transmitter 
and the receiver. The excitement in this area stems from the breakthrough results in [28, 27, 29] indicating that 
the capacity of a fading channel with multiple inputs and outputs (a MIMO channel) is M times larger then the 
channel capacity without multiple antennas, where M = min (M t , M r ) for M t the number of transmit antennas 
and M r the number of receive antennas. We will discuss capacity of multiple antenna systems in Chapter 10. 

4.2.6 Capacity Comparisons 

In this section we compare capacity with transmitter and receiver CSI for different power allocation policies along 
with the capacity under receiver CSI only. Figures 4.6, 4.7, and 4.8 show plots of the different capacities (4.4), 
4.9), (4.18), and (4.22) as a function of average received SNR for log-normal fading (cr=8 dB standard deviation), 
Rayleigh fading, and Nakagami fading (with Nakagami parameter m = 2). Nakagami fading with m = 2 is 
roughly equivalent to Rayleigh fading with two-antenna receiver diversity. The capacity in AWGN for the same 
average power is also shown for comparison. Note that the capacity in log-normal fading is plotted relative to 
average dB SNR (ndB), not average SNR in dB (10 log 10 /<): the relation between these values, as given by (2.45) 
in Chapter 2, is 101og 10/ u = h,ib + &dB ln(10)/20. 




Figure 4.6: Capacity in Log-Normal Shadowing. 

Several observations in this comparison are worth noting. First, we see in the figure that the capacity of the 
AWGN channel is larger than that of the fading channel for all cases. However, at low SNRs the AWGN and 
fading channel with transmitter and receiver CSI have almost the same capacity. In fact, at low SNRs (below 
0 dB), capacity of the fading channel with transmitter and receiver CSI is larger than the corresponding AWGN 
channel capacity. That is because the AWGN channel always has the same low SNR, thereby limiting it capacity. 
A fading channel with this same low average SNR will occasionally have a high SNR, since the distribution has 
infinite range. Thus, if all power and rate is transmitted over the channel during these very infrequent high SNR 
values, the capacity will be larger than on the AWGN channel with the same low average SNR. 

The severity of the fading is indicated by the Nakagami parameter m, where m = 1 for Rayleigh fading and 
m = oo for an AWGN channel without fading. Thus, comparing Figures 4.7 and 4.8 we see that, as the severity 
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Figure 4.7: Capacity in Rayleigh Fading. 




Figure 4.8: Capacity in Nakagami Fading (in = 2). 

of the fading decreases (Rayleigh to Nakagami with m = 2), the capacity difference between the various adaptive 
policies also decreases, and their respective capacities approach that of the AWGN channel. 

The difference between the capacity curves under transmitter and receiver CSI (4.9) and receiver CSI only 
(4.4) are negligible in all cases. Recalling that capacity under receiver CSI only (4.4) and under transmitter and 
receiver CSI without power adaptation (4.7) arc the same, this implies that when the trans mi ssion rate is adapted 
relative to the channel, adapting the power as well yields a negligible capacity gain. It also indicates that transmitter 
adaptation yields a negligible capacity gain relative to using only receiver side information. In severe fading condi- 
tions (Rayleigh and log-normal fading), maximum outage capacity exhibits a 1-5 dB rate penalty and zero-outage 
capacity yields a very large capacity loss relative to Shannon capacity. However, under mild fading conditions 
(Nakagami with m = 2) the Shannon, maximum outage, and zero-outage capacities are within 3 dB of each other 
and within 4 dB of the AWGN channel capacity. These differences will further decrease as the fading diminishes 
(m — y oo for Nakagami fading). 

We can view these results as a tradeoff between capacity and complexity. The adaptive policy with transmitter 
and receiver side information requires more complexity in the transmitter (and it typically also requires a feedback 
path between the receiver and transmitter to obtain the side information). However, the decoder in the receiver is 
relatively simple. The nonadaptive policy has a relatively simple transmission scheme, but its code design must 
use the channel correlation statistics (often unknown), and the decoder complexity is proportional to the channel 
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decorrelation time. The channel inversion and truncated inversion policies use codes designed for AW GN channels, 
and arc therefore the least complex to implement, but in severe fading conditions they exhibit large capacity losses 
relative to the other techniques. 

In general. Shannon capacity analysis does not show how to design adaptive or nonadaptive techniques for 
real systems. Achievable rates for adaptive trellis-coded MQAM have been investigated in [25], where a simple 4- 
state trellis code combined with adaptive six-constellation MQAM modulation was shown to achieve rates within 7 
dB of the Shannon capacity (4.9) in Figures 4.6 and 4.7. More complex codes further close the gap to the Shannon 
limit of fading channels with transmitter adaptation. 

4.3 Capacity of Frequency-Selective Fading Channels 

In this section we consider the Shannon capacity of frequency-selective fading channels. We first consider the 
capacity of a time-invariant frequency-selective fading channel. This capacity analysis is similar to that of a flat- 
fading channel with the time axis replaced by the frequency axis. Next we discuss the capacity of time-varying 
frequency-selective fading channels. 

4.3.1 Time-Invariant Channels 

Consider a time-invariant channel with frequency response H(f), as shown in Figure 4.9. Assume a total transmit 
power constraint P. When the channel is time-invariant it is typically assumed that H (/) is known at both the 
transmitter and receiver: capacity of time-invariant channels under different assumptions of this channel knowledge 
are discussed in [18]. 




Figure 4.9: Time-Invariant Frequency-Selective Fading Channel. 



Let us first assume that H(f) is block-fading, so that frequency is divided into subchannels of bandwidth B, 
where H (/) = Hj is constant over each block, as shown in Figure 4. 10. The frequency-selective fading channel 
thus consists of a set of AW GN channels in parallel with SNR | Hj \ 2 Pj/ {NqB) on the jth channel, where Pj is the 
power allocated to the jth channel in this parallel set, subject to the power constraint Pj < P. 

The capacity of this parallel set of channels is the sum of rates associated with each channel with power 
optimally allocated over all channels [5, 6] 



C 



22 B l0g 2 
maxP,:Ej P i< P 
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(4.23) 



Note that this is similar to the capacity and optimal power allocation for a flat-fading channel, with power and rate 
changing over frequency in a deterministic way rather than over time in a probabilistic way. The optimal power 
allocation is found via the same Lagrangian technique used in the flat-fading case, which leads to the water-filling 
power allocation 
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7 j < 7o 



(4.24) 
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Figure 4.10: Block Frequency-Selective Fading 



for some cutoff value 70 , where 7 j = \Hj\ 2 P /{NqB) is the SNR associated with the jth channel assuming it is 
allocated the entire power budget. This optimal power allocation is illustrated in Figure 4.1 1. The cutoff value is 
obtained by substituting the power adaptation formula into the power constraint, so 70 must satisfy 




(4.25) 



The capacity then becomes 

C= F?log 2 ( 7 ,-/ 7 o). (4-26) 

This capacity is achieved by sending at different rates and powers over each subchannel. Multicarrier modulation 
uses the same technique in adaptive loading, as discussed in more detail in Chapter 12. 
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Figure 4.11: Water-Filling in Block Frequency-Selective Fading 



When H(f) is continuous the capacity under power constraint P is similar to the case of the block- fading 
channel, with some mathematical intricacies needed to show that the channel capacity is given by 



C = max 

PUY-f P(fW<P 



log 2 ( 






(4.27) 



The equation inside the integral can be thought of as the incremental capacity associated with a given frequency / 
over the bandwidth df with power allocation P(f) and channel gain \H(f)\ 2 . This result is formally proven using 
a Karhunen-Loeve expansion of the channel hit) to create an equivalent set of parallel independent channels [5, 
Chapter 8.5]. An alternate proof decomposes the channel into a parallel set using the discrete Fourier transform 
(DFT) [12]: the same premise is used in the discrete implementation of multicarrier modulation described in 
Chapter 12.4. 
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The power allocation over frequency, P(f), that maximizes (4.27) is found via the Lagrangian technique. The 
resulting optimal power allocation is water-filling over frequency: 



This results in channel capacity 



P(f) = { i ~ ^fj T(/) > 7o 

C = [ log 2 (7(/)/7o)d/. 

4 /: 7(/)>70 



(4.28) 

(4.29) 



Example 4.7 : Consider a time-invariant frequency-selective block fading channel consisting of three subchannels 
of bandwidth B = 1 MHz. The frequency response associated with each channel is Hi = 1, H 2 = 2 and 7/ 3 = 3. 
The transmit power constraint is P = 10 mW and the noise PSD is N$ = 1CP 9 W/Hz. Find the Shannon capacity 
of this channel and the optimal power allocation that achieves this capacity. 

Solution: We first first find 7 j = \Hj\ 2 P/(Nb) for each subchannel, yielding 71 = 10, 72 = 40 and 73 = 90. The 
cutoff 70 must satisfy (4.25). Assuming all subchannels are allocated power, this yields 

— = 1 + Y — = 1.14 70 = 2.64 < 7,- V). 

7o y lj 

Since the cutoff 70 is less than 7 j for all j, our assumption that all subchannels are allocated power is consistent, so 
this is the correct cutoff value. The corresponding capacity is C = X^j=i Hlog 2 (77/70) = 1000000(log 2 (10/2.64)+ 
log 2 (40/2.64) + log 2 (90/2.64)) = 10.93 Mbps. 



4.3.2 Time-Varying Channels 

The time-varying frequency-selective fading channel is si mi lar to the model shown in Figure 4.9, except that 
H(f ) = H(f , i), i.e. the channel varies over both frequency and time. It is difficult to determine the capacity of 
time-varying frequency-selective fading channels, even when the instantaneous channel H(f , i) is known perfectly 
at the transmitter and receiver, due to the random effects of self-interference (ISI). In the case of transmitter 
and receiver side information, the optimal adaptation scheme must consider the effect of the channel on the past 
sequence of transmitted bits, and how the ISI resulting from these bits will affect future transmissions [30]. The 
capacity of time-varying frequency-selective fading channels is in general unknown, however upper and lower 
bounds and limiting formulas exist [30, 31]. 

We can approximate channel capacity in time-varying frequency-selective fading by taking the channel band- 
width B of interest and divide it up into subchannels the size of the channel coherence bandwidth B c , as shown in 
Figure 4. 12. We then assume that each of the resulting subchannels is independent, time-varying, and flat- fading 
with H(f , i) = Hj[i ] on the j th subchannel. 

Under this assumption, we obtain the capacity for each of these flat-fading subchannels based on the average 
power Pj that we allocate to each subchannel, subject to a total power constraint P. Since the channels are 
independent, the total channel capacity is just equal to the sum of capacities on the individual narrowband flat- 
fading channels subject to the total average power constraint, averaged over both time and frequency: 

C = _ max Cj(Pj), (4.30) 
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Figure 4.12: Channel Division in Frequency-Selective Fading 



where Cj(Pj) is the capacity of the flat-fading subchannel with average power P j and bandwidth B c given by 
(4. 13), (4.4), (4. 18), or (4.22) for Shannon capacity under different side information and power allocation policies. 
We can also define Cj(Sj) as a capacity versus outage if only the receiver has side information. 

We will focus on Shannon capacity assuming perfect transmitter and receiver channel CSI, since this upper- 
bounds capacity under any other side information assumptions or suboptimal power allocation strategies. We know 
that if we fix the average power per subchannel, the optimal power adaptation follows a water-filling formula. We 
also expect that the optimal average power to be allocated to each subchannel should also follow a water-filling, 
where more average power is allocated to better subchannels. Thus we expect that the optimal power alloca- 
tion is a two-dimensional water-filling in both time and frequency. We now obtain this optimal two-dimensional 
water-filling and the corresponding Shannon capacity. 

Define 7 j[i\ = \Hj[i\\ 2 P / (NqB) to be the instantaneous SNR on the jth subchannel at time i assuming the 
total power P is allocated to that time and frequency. We allow the power Pji'jj) to vary with 7 j\i\. The Shannon 
capacity with perfect transmitter and receiver CSI is given by optimizing power adaptation relative to both time 
(represented by 7 j[i] = 7 ^) and frequency (represented by the subchannel index j): 
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(4.31) 



To find the optimal power allocation Pj ( 7 ^ ) , we form the Lagrangian 

J i p jilj)) = J2j B< log 2 p('Yj)d'y j p j('Yj)p('lj)d'yj- (4-32) 



Note that (4.32) is similar to the Lagrangian for the flat-fading channel (4.10) except that the dimension of fre- 
quency has been added by summing over the subchannels. Differentiating the Lagrangian and setting this derivative 
equal to zero eliminates all terms except the given subchannel and associated SNR: 
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pirn) = 0 . 



(4.33) 



Solving for Pji’jj) yields the same water-filling as the flat fading case: 



p jirtj) = { i - k 7 i > 70 
P \o 7j < 70 



where the cutoff value 70 is obtained from the total power constraint over both time and frequency: 




p jii)pjir) d n = p - 



(4.35) 
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Thus, the optimal power allocation (4.34) is a two-dimensional waterfilling with a common cutoff value 70 . Di- 
viding the constraint (4.35) by P and substituting in the optimal power allocation (4.34), we get that 70 must 
satisfy 

^r(4^) p(7 ^ =L <4 - 36) 

It is interesting to note that in the two-dimensional water-filling the cutoff value for all subchannels is the same. 
This implies that even if the fading distribution or average fade power on the subchannels is different, all subchan- 
nels suspend trans mi ssion when the instantaneous SNR falls below the common cutoff value 70 . Substituting the 
optimal power allocation (4.35) into the capacity expression (4.31) yields 

C = Y, f B ° lo g 2 (^) ( 4 -37) 
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Chapter 4 Problems 



1. Capacity in AWGN is given by C = B log 2 (l + S/ (NqB)). Find capacity in the limit of infinite bandwidth 
B — » oo as a function of S. 

2. Consider an AWGN channel with bandwidth 50 MHz, received power 10 mW, and noise PSD Nq = 2 x 
10 _ 9 W/Hz. How much does capacity increase by doubling the received power? How much does capacity 
increase by doubling the channel bandwidth? 

3. Consider two users simultaneously transmitting to a single receiver in an AWGN channel. This is a typical 
scenario in a cellular system with multiple users sending signals to a base station. Assume the users have 
equal received power of 10 mW and total noise at the receiver in the bandwidth of interest of 0.1 mW. The 
channel bandwidth for each user is 20 MHz. 

(a) Suppose that the receiver decodes user l’s signal first. In this decoding, user 2’s signal acts as noise (as- 
sume it has the same statistics as AWGN). What is the capacity of user l’s channel with this additional 
interference noise? 

(b) Suppose that after decoding user 1 ’s signal, the decoder re-encodes it and subtracts it out of the received 
signal. Then in the decoding of user 2’s signal, there is no interference from user l’s signal. What then 
is the Shannon capacity of user 2’s channel? 

Note: We will see in Chapter 14 that the decoding strategy of successively subtracting out decoded signals 
is optimal for achieving Shannon capacity of a multiuser channel with independent transmitters sending to 
one receiver. 

4. Consider a flat-fading channel of bandwidth 20 MHz where for a fixed transmit power S, the received SNR 
is one of six values: 71 = 20 dB, 72 = 15 dB, 73 = 10 dB, 74 = 5 dB, and 75 = 0 dB and 76 = —5 dB. 
The probability associated with each state is pi = pe = .1, j>> = 74 = .15 , 77 = 77 = .25. Assume only the 
receiver has CSI. 

(a) Find the Shannon capacity of this channel. 

(b) Plot the capacity versus outage for 0 < p ou t < 1 and find the maximum average rate that can be 
correctly received (maximum C 0 ). 

5. Consider a flat-fading channel where for a fixed transmit power S, the received SNR is one of four values: 
71 = 30 dB, 72 = 20 dB, 73 = 10 dB, and 74 = 0 dB. The probability associated with each state is p\ = .2, 
P 2 = -3, P 3 = .3, and 74 = .2. Assume both transmitter and receiver have CSI. 

(a) Find the optimal power control policy S(i)/S for this channel and its corresponding Shannon capacity 
per unit Hertz (C / B). 

(b) Find the channel inversion power control policy for this channel and associated zero-outage capacity 
per unit bandwidth. 

(c) Find the truncated channel inversion power control policy for this channel and associated outage ca- 
pacity per unit bandwidth for 3 different outage probabilities: p ou t = .1, p ou t = .01, and p out (and the 
associated cutoff 70 ) equal to the value that achieves maximum outage capacity. 

6 . Consider a cellular system where the power falloff with distance follows the formula P r (d ) = Pt(do/d) a , 
where do = 100m and a is a random variable. The distribution for a is p(a = 2) = .4, p(a = 2.5) = .3, 
p(a = 3) = .2, and p(a = 4) = .1 Assume a receiver at a distance d = 1000 m from the transmitter, with 
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an average transmit power constraint of Pt = 100 mW and a receiver noise power of .1 mW. Assume both 
transmitter and receiver have CSI. 

(a) Compute the distribution of the received SNR. 

(b) Derive the optimal power control policy for this channel and its corresponding Shannon capacity per 
unit Hertz ( C/B ). 

(c) Determine the zero-outage capacity per unit bandwidth of this channel. 

(d) Determine the maximum outage capacity per unit bandwidth of this channel. 

7. Assume a Rayleigh fading channel, where the transmitter and receiver have CSI and the distribution of the 
fading SNR p(y) is exponential with mean 7 = lOdB. Assume a channel bandwidth of 10 MHz. 

(a) Find the cutoff value 70 and the corresponding power adaptation that achieves Shannon capacity on 
this channel. 

(b) Compute the Shannon capacity of this channel. 

(c) Compare your answer in part (b) with the channel capacity in AWGN with the same average SNR. 

(d) Compare your answer in paid (b) with the Shannon capacity where only the receiver knows 7 [i], 

(e) Compare your answer in paid (b) with the zero-outage capacity and outage capacity with outage prob- 
ability .05. 

(f) Repeat parts b, c, and d (i.e. obtain the Shannon capacity with perfect transmitter and receiver side 
information, in AWGN for the same average power, and with just receiver side information) for the 
same fading distribution but with mean 7 = — 5dB. Describe the circumstances under which a fading 
channel has higher capacity than an AWGN channel with the same average SNR and explain why this 
behaivor occurs. 

8 . Time-Varying Interference: This problem illustrates the capacity gains that can be obtained from interference 
estimation, and how a malicious jammer can wreak havoc on link performance. Consider the following 
interference channel. 




The channel has a combination of AWGN n[k] and interference /[/;;]. We model I[k] as AWGN. The inter- 
ferer is on (i.e. the switch is down) with probability .25 and off (i.e. the switch is up) with probability .75. 
The average transmit power is 10 mW, the noise spectral density is 10 _8 W/Hz, the channel bandwidth B is 
10 KHz (receiver noise power is N 0 B), and the interference power (when on) is 9 mW. 

(a) What is the Shannon capacity of the channel if neither transmitter nor receiver know when the interferer 
is on? 

(b) What is the capacity of the channel if both transmitter and receiver know when the interferer is on? 
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(c) Suppose now that the interferer is a malicious jammer with perfect knowledge of x[k] (so the interferer 
is no longer modeled as AWGN). Assume that neither transmitter nor receiver have knowledge of the 
jammer behavior. Assume also that the jammer is always on and has an average transmit power of 10 
mW. What strategy should the jammer use to minimize the SNR of the received signal? 



9. Consider the malicious interferer from the previous problem. Suppose that the transmitter knows the inter- 
ference signal perfectly. Consider two possible transmit strategies under this scenario: the transmitter can 
ignore the interference and use all its power for sending its signal, or it can use some of its power to cancel 
out the interferer (i.e. transmit the negative of the interference signal). In the first approach the interferer 
will degrade capacity by increasing the noise, and in the second strategy the interferer also degrades capacity 
since the transmitter sacrifices some power to cancel out the interference. Which strategy results in higher 
capacity? Note: there is a third strategy, where the encoder actually exploits the structure of the interfer- 
ence in its encoding. This strategy is called dirty paper coding, and is used to achieve Shannon capacity on 
broadcast channels with multiple antennas. 

10. Show using Lagrangian techniques that the optimal power allocation to maximize the capacity of a time- 
invariant block fading channel is given by the water tilling formula in (4.24). 

1 1 . Consider a time-invariant block fading channel with frequency response 



H{f) 



'1 f c - 20MHz <f<f c - 10MHz 
.5 f c - 10MHz < / < f c 
< 2 f c <f<fc + 10MHz 

.25 f c + 10MHz < f < f c + 20MHz 
0 else 



For a transmit power of lOmW and a noise power spectral density of ,001/iW per Hertz, find the optimal 
power allocation and corresponding Shannon capacity of this channel. 



12. Show that the optimal power allocation to maximize the capacity of a time-invariant frequency selective 
fading channel is given by the water tilling formula in (4.28). 



13. Consider a frequency-selective fading channel with total bandwidth 12 MHz and coherence bandwidth B c = 
4 MHz. Divide the total bandwidth into 3 subchannels of bandwidth B c , and assume that each subchannel is 
a Rayleigh flat-fading channel with independent fading on each subchannel. Assume the subchannels have 
average gains E[|iFi(t) | 2 ] = l,E[\H 2 (t)\ 2 ] = .5, and E[|F/ 3 (f)| 2 ] = .125. Assume a total transmit power of 
30 mW, and a receiver noise spectral density of .001 // W per Hertz. 



(a) Find the optimal two-dimensional water-tilling power adaptation for this channel and the corresponding 
Shannon capacity, assuming both transmitter and receiver know the instantaneous value of Hj(t),j = 
1,2,3. 

(b) Compare the capacity of paid (a) with that obtained by allocating an equal average power of 10 mW to 
each subchannel and then water-filling on each subchannel relative to this power allocation. 
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Chapter 5 

Digital Modulation and Detection 



The advances over the last several decades in hardware and digital signal processing have made digital transceivers 
much cheaper, faster, and more power-efficient than analog transceivers. More importantly, digital modulation 
offers a number of other advantages over analog modulation, including higher data rates, powerful error correction 
techniques, resistance to channel impairments, more efficient multiple access strategies, and better security and 
privacy. Specifically, high level modulation techniques such as MQAM allow much higher data rates in digital 
modulation as compared to analog modulation with the same signal bandwidth. Advances in coding and coded- 
modulation applied to digital signaling make the signal much less susceptible to noise and fading, and equalization 
or multicarrier techniques can be used to mitigate 1ST Spread spectrum techniques applied to digital modulation 
can remove or combine multipath, resist interference, and detect multiple users simultaneously. Finally, digital 
modulation is much easier to encrypt, resulting in a higher level of security and privacy for digital systems. For all 
these reasons, systems currently being built or proposed for wireless applications are all digital systems. 

Digital modulation and detection consist of transferring information in the form of bits over a communications 
channel. The bits are binary digits taking on the values of either 1 or 0. These information bits are derived 
from the information source, which may be a digital source or an analog source that has been passed through an 
A/D converter. Both digital and A/D converted analog sources may be compressed to obtain the information bit 
sequence. Digital modulation consists of mapping the information bits into an analog signal for transmission over 
the channel. Detection consists of determining the original bit sequence based on the signal received over the 
channel. The main considerations in choosing a particular digital modulation technique are 

• high data rate 

• high spectral efficiency (minimum bandwidth occupancy) 

• high power efficiency (minimum required transmit power) 

• robustness to channel impairments (minimum probability of bit error) 

• low power/cost implementation 

Often these are conflicting requirements, and the choice of modulation is based on finding the technique that 
achieves the best tradeoff between these requirements. 

There are two main categories of digital modulation: amplitude/phase modulation and frequency modulation. 
Since frequency modulation typically has a constant signal envelope and is generated using nonlinear techniques, 
this modulation is also called constant envelope modulation or nonlinear modulation, and amplitude/phase mod- 
ulation is also called linear modulation. Linear modulation generally has better spectral properties than nonlinear 
modulation, since nonlinear processing leads to spectral broadening. However, amplitude and phase modulation 
embeds the information bits into the amplitude or phase of the transmitted signal, which is more susceptible to 
variations from fading and interference. In addition, amplitude and phase modulation techniques typically require 
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lineal - amplifiers, which are more expensive and less power efficient than the nonlinear amplifiers that can be used 
with nonlinear modulation. Thus, the general tradeoff of linear versus nonlinear modulation is one of better spec- 
tral efficiency for the former technique and better power efficiency and resistance to channel impairments for the 
latter technique. Once the modulation technique is determined, the constellation size must be chosen. Modulations 
with large constellations have higher data rates for a given signal bandwidth, but are more susceptible to noise, 
fading, and hardware imperfections. Finally, the simplest demodulators require a coherent phase reference with re- 
spect to the transmitted signal. This coherent reference may be difficult to obtain or significantly increase receiver 
complexity. Thus, modulation techniques that do not require a coherent phase reference are desirable. 

We begin this chapter with a general discussion of signal space concepts. These concepts greatly simplify 
the design and analysis of modulation and demodulation techniques by mapping infinite-dimensional signals to 
a finite-dimensional vector-space. The general principles of signal space analysis will then be applied to the 
analysis of amplitude and phase modulation techniques, including pulse amplitude modulation (PAM), phase- 
shift keying (PSK), and quadrature amplitude modulation (QAM). We will also discuss constellation shaping 
and quadrature offset techniques for these modulations, as well as differential encoding to avoid the need for 
a coherent phase reference. We then describe frequency modulation techniques and their properties, including 
frequency shift keying (FSK), minimum-shift keying (MSK), and continuous-phase FSK (CPFSK). Both coherent 
and noncoherent detection of these techniques will be discussed. Pulse shaping techniques to improve the spectral 
properties of the modulated signals will also be covered, along with issues associated with carrier phase recovery 
and symbol synchronization. 

5.1 Signal Space Analysis 

Digital modulation encodes a bit stream of finite length into one of several possible transmitted signals. Intuitively, 
the receiver minimizes the probability of detection error by decoding the received signal as the signal in the set of 
possible transmitted signals that is “closest” to the one received. Determining the distance between the transmitted 
and received signals requires a metric for the distance between signals. By representing signals as projections 
onto a set of basis functions, we obtain a one-to-one correspondence between the set of transmitted signals and 
their vector representations. Thus, we can analyze signals in finite-dimensional vector space instead of infinite- 
dimensional function space, using classical notions of distance for vector spaces. In this section we show how 
digitally modulated signals can be represented as vectors in an appropriately-defined vector space, and how optimal 
demodulation methods can be obtained from this vector space representation. This general analysis will then be 
applied to specific modulation techniques in later sections. 

5.1.1 Signal and System Model 

Consider the communication system model shown in Figure 5.1. Every T seconds, the system sends K = log 2 M 
bits of information through the channel for a data rate of R = K/T bits per second (bps). There are M = 
2 k possible sequences of K bits, and we say that each bit sequence of length K comprises a message mi = 
{bi, . . . , b^'} G M., where M ={mi, . . . , m,M} is the set of all such messages. The messages have probability 
Pi of being selected for transmission, where ) j 1 , p, = 1. 

Suppose message m, is to be transmitted over the channel during the time interval [0, T). Since the channel 
is analog, the message must be embedded into an analog signal for channel transmission. Thus, each message 
rrii G M is mapped to a unique analog signal Si(t) G S = {si(i), . . . , sj^(t)} where Si(t) is defined on the time 
interval [0, T ) and has energy 
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AWGN Channel 




Figure 5.1: Communication System Model 



Since each message represents a bit sequence, each signal Si(t) £ S also represents a bit sequence, and detection 
of the transmitted signal Si(t) at the receiver is equivalent to detection of the transmitted bit sequence. When 
messages are sent sequentially, the transmitted signal becomes a sequence of the corresponding analog signals 
over each time interval [kT, (k + 1)T): s(t) = Ylk s *(^ — kT), where Si(t) is the analog signal corresponding to 
the message m; designated for the transmission interval [kT. [k + 1 )T). This is illustrated in Figure 5.2, where we 
show the transmitted signal s(t) = s \ (t) + s^it — T) + s\{t — 2 T) + s\(t — 3 T) corresponding to the string of 
messages mi, m2, m\,m\ with message m, mapped to signal Si(t). 




Figure 5.2: Transmitted Signal for a Sequence of Messages 

In the model of Figure 5.1, the transmitted signal is sent through an AWGN channel, where a white Gaussian 
noise process n(t) of power spectral density Nq/2 is added to form the received signal r(t) = s(t) + n(t). Given 
r(t) the receiver must determine the best estimate of which s,(t) £ S was transmitted during each transmission 
interval [kT, (k + 1)T). This best estimate for Sjit) is mapped to a best estimate of the message irijit) £ M. and 
the receiver then outputs this best estimate m = { b 1 , . . . , b j s - } £ M. of the transmitted bit sequence. 

The goal of the receiver design in estimating the transmitted message is to minimize the probability of message 
error: 

M 

P e = ^2p(m / rrii \ rn t sent )p{mt sent) (5.2) 

i= 1 

over each time interval [kT, (k + 1 )T). By representing the signals {si(t),i = 1, . . . , M} geometrically, we can 
solve for the optimal receiver design in AWGN based on a minimum distance criterion. Note that, as we saw in 
previous chapters, wireless channels typically have a time-varying impulse response in addition to AWGN. We 
will consider the effect of an arbitrary channel impulse response on digital modulation performance in Chapter 6, 
and methods to combat this performance degradation in Chapters 11-13. 

5.1.2 Geometric Representation of Signals 

The basic premise behind a geometrical representation of signals is the notion of a basis set. Specifically, using 
a Gram-Schmidt orthogonalization procedure [2, 3], it can be shown that any set of M real energy signals S = 
(si(t), . . . , SM{t)) defined on [0, T) can be represented as a linear combination of N < M real orthonormal basis 
functions {(j)i(t), . . . , We say that these basis functions span the set S. Thus, we can write each Sj(f) £ S 
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in terms of its basis function representation as 



N 

Si(t ) = ^2 Sij<t>j(t ), 0 < t < T, 

3 = 1 



(5.3) 



where 



s ij — 



Si{t)(j>j(t)dt 



is a real coefficient representing the projection of Si(t) onto the basis function 0 : j(t) and 






1 i = j 
0 i^j 



(5.4) 



(5.5) 



If the signals } arc linearly independent then N = M, otherwise N < M. Moreover, the minimum number 
N of basis functions needed to represent any signal Si(t) of duration T and bandwidth B is roughly 2 BT [4, 
Chapter 5.3]. The signal Si(t) thus occupies a signal space of dimension 2 BT. 

For lineal - passband modulation techniques, the basis set consists of the sine and cosine functions: 



and 




cos(27r/ c f) 



sin(27r/ c f). 



(5.6) 



(5.7) 



The y/2 ~Jr factor is needed for normalization so that t)dt = 1 , i = 1 , 2. In fact, with these basis functions 

we only get an approximation to (5.5), since 




2 

T 



.5 [1 + cos(47r/ c f)]df 



sin(47r/ c T) 

4tt/ c T 



(5.8) 



The numerator in the second term of (5.8) is bounded by one and for f c T » 1 the denominator of this term is 
very large. Thus, this second term can be neglected. Similarly, 



(f>i(t)<j>2(t)dt = — 



.5 sin(47r/ c f)df = 



- cos(47 xf c T) 
4vr f c T 



0 , 



(5.9) 



where the approximation is taken as an equality for f c T » 1. 

With the basis set 4>i(t) = y/2/T cos(27r f c t) and fait) = \/2/T sin(27r/ c f) the basis function represen- 
tation (5.3) corresponds to the complex baseband representation of Si{t) in terms of its in-phase and quadrature 
components with an extra factor of \J‘l /T : 



Si(t) = sn \j — cos(27r/ c f) + s i2 \l — s\n(2-Kf c t). 



(5.10) 



Note that the carrier basis functions may have an initial phase offset <j>o. The basis set may also include a baseband 
pulse-shaping filter g(t) to improve the spectral characteristics of the transmitted signal: 



Si(t) = sng(t) cos(27r/ c f) + Si 2 g{t) sin(27r f c t). 



(5-11) 
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In this case the pulse shape g(t) must maintain the orthonormal properties (5.5) of basis functions, i.e. we must 
have 

g 2 (t) cos 2 {2it f c t)dt = 1 (5.12) 

and 

g 2 {t) cos(2ir f c t) sin(27r/ c t) = 0, (5.13) 

where the equalities may be approximations for f c T » 1 as in (5.8) and (5.9) above. If the bandwidth of g(t) 
satisfies B « f c then g 2 (t) is roughly constant over T c , so (5.13) is approximately true since the sine and cosine 
functions are orthogonal over one period T c = 1/ f c . The simplest pulse shape that satisfies (5.12) and (5.13) is the 
rectangular pulse shape g(t ) = y/2/T, 0 < t < T. 





Example 5.1: 

Binary phase shift keying (BPSK) modulation transmits the signal s\(t ) = a cos(27r/ c t), 0 < t < T, to send a 1 
bit and the signal S 2 (t) = — a cos(2n f c t) , 0 < t < T, to send a 0 bit. Find the set of orthonormal basis functions 
and coefficients { s l3 } for this modulation. 

Solution: There is only one basis function for s\(t) and -S 2 {t), 4>{t) = y/2/T cos(2nf c t), where the y/2/T is 
needed for normalization. The coefficients arc then given by s i = a^/T/2 and s 2 = —ay/T/2. 



We denote the coefficients {sjj} as a vector s, = ( s t \ , . . . , Sjjv) G 777 v which is called the signal constellation 
point corresponding to the signal Sj(t). The signal constellation consists of all constellation points {s i , . . . , s Mi- 
Given the basis functions . . . , } there is a one-to-one correspondence between the transmitted signal 

Si(t ) and its constellation point Sj. Specifically, Si(t) can be obtained from s, by (5.3) and s, can be obtained from 
Si(t ) by (5.4). Thus, it is equivalent to characterize the transmitted signal by Si(t) or s,. The representation of s t (t) 
in terms of its constellation point s G 7 Z N is called its signal space representation and the vector space containing 
the constellation is called the signal space. A two-dimensional signal space is illustrated in Figure 5.3, where we 
show Sj G 77. 2 with the 7th axis of 72 2 corresponding to the basis function i = 1,2. With this signal space 
representation we can analyze the infinite-dimensional functions Sj(t) as vectors s j in finite-dimensional vector 
space 77. 2 . This greatly simplifies the analysis of the system performance as well as the derivation of the optimal 
receiver design. Signal space representations for common modulation techniques like MPSK and MQAM arc 
two-dimensional (corresponding to the in-phase and quadrature basis functions), and will be given later in the 
chapter. 

In order to analyze signals via a signal space representation, we require a few definitions for vector character- 
ization in the vector space TZ N . The length of a vector in 77. v is defined as 



N 



Sj = 






s 2 



The distance between two signal constellation points s j and s/ c is thus 



Sfc || — 



n rrr 

a X'O tj - s k j) 2 = \ ( Si(t ) - s k (t)) 2 dt, 

\ j=i V Jo 



(5.14) 



(5.15) 
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Figure 5.3: Signal Space Representation 



where the second equality is obtained by writing Si(t) and .sy.(t) in their basis representation (5.3) and using the 
orthonormal properties of the basis functions. Finally, the inner product < Si(t ), Sfc(t) > between two real signals 
Si(t) and -Sfc(f) on the interval [0, T] is 



< Si(t), s k (t) > = 



i(t)s k (t)dt. 



(5.16) 



Similarly, the inner product < Sj, s/~ > between two real vectors is 






>= S;S 







Si(t)sk(t)dt =< Si(t),Sk(t) >, 



(5.17) 



where the equality between the vector inner product and the corresponding signal inner product follows from the 
basis representation of the signals (5.3) and the orthonormal property of the basis functions (5.5). We say that two 
signals are orthogonal if their inner product is zero. Thus, by (5.5), the basis functions are orthogonal functions. 



5.1.3 Receiver Structure and Sufficient Statistics 



Given the channel output r(t) = Si(t) + n(t). 0 < t < T, we now investigate the receiver structure to determine 
which constellation point s,; or, equivalently, which message m;, was sent over the time interval [0, T ). A similar 
procedure is done for each time interval [kT, (k + \)T). We would like to convert the received signal r(t) over each 
time interval into a vector, as it allows us to work in finite-dimensional vector space to estimate the transmitted 
signal. However, this conversion should not compromise the estimation accuracy. For this conversion, consider the 
receiver structure shown in Figure 5.4, where 



and 



Sij — 



Si(t)cj)j(t)dt, 



(5.18) 




n{t)(f)j(t)dt. 



(5.19) 
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We can rewrite r(t) as 



N N 

^ ~2(sij + rij)(f>j(t) + n r (t ) = ^2 r j(j)j(t) + n r (t), (5.20) 

3 = 1 3 = 1 

where rj = Sij + n.j and n r (t) = n(t) — n j4 > j{t) denotes the “remainder” noise, which is the component 

of the noise orthogonal to the signal space. If we can show that the optimal detection of the transmitted signal 
constellation point s, given received signal r(t) does not make use of the remainder noise n r (t), then the receiver 
can make its estimate fn of the transmitted message m, as a function of r = (r i, . . . , rfa alone. In other words, 
r = (n, . . . , r n) is a sufficient statistic for r(t) in the optimal detection of the transmitted messages. 




Figure 5.4: Receiver Structure for Signal Detection in AWGN. 

It is intuitively clear that the remainder noise n r (t) should not help in detecting the transmitted signal s,it) 
since its projection onto the signal space is zero. This is illustrated in Figure 5.5, where we assume the signal lies 
in a space spanned by the basis set (</>i(t), fa (t)) while the remainder noise lies in a space spanned by the basis 
function fa r (t), which is orthogonal to rii (t) and fa (t). The vector space in the figure shows the projection of the 
received signal onto each of these basis functions. Specifically, the remainder noise in Figure 5.5 is represented 
by n r , where n r (t) = n r fa lr (t). The received signal is represented by r + n r . From the figure it appeal's that 
projecting r + n r onto r will not compromise the detection of which constellation s t was transmitted, since n r lies 
in a space orthogonal to the space where s, lies. We now proceed to show mathematically why this intuition is 
correct. 




Figure 5.5: Projection of Received Signal onto Received Vector r. 

Let us first examine the distribution of r. Since n(t) is a Gaussian random process, if we condition on 
the transmitted signal $ t (t) then the channel output r(t) = $i(t) + n(t) is also a Gaussian random process and 
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r = (n, . . . , rjv) is a Gaussian random vector. Recall that rj = s t j + n ? . Thus, conditioned on a transmitted 
constellation s,;, we have that 

/Vj|sj ['S’jj T nj | Sjj ] S{j (5.21) 

since n(i) has zero mean, and 

|s,; E [ry /q- | S/ ] T fij Sjj 1 6' ; j] (5.22) 

Moreover, 



Cov[rjT fe |si] 



E[(rj Hrj ) (j"k ) I s *] 

E [rijnk] 



E 



n{t)4>j{t)dt / n(r)(/>fc(r)dr 



rT r T 



/ 0 JO 

/7 T 



E [n{t)n{T)\^j{t)4>k{r)dtdT 

T)(j)j(t)(f>k(T)dtdT 



~Y j Q 4>j{t)^k{t)dt 



f iVo/2 j = k 
\ 0 j^k 



(5.23) 



where the last equality follows from the orthogonality of the basis functions. Thus, conditioned on the transmitted 
constellation s t , the r/s arc uncorrelated and, since they are Gaussian, they are also independent. Moreover 

E[rij\ = No/2. 

We have shown that, conditioned on the transmitted constellation s,, r ? is a Gauss-distributed random variable 
that is independent of i j. . k / j and has mean s tJ and variance No/2. Thus, the conditional distribution of r is 
given by 



N 

p(r|s ? ; sent) = JJp( r jl m *) 
3 = 1 



{irN 0 ) N / 2eXP 



N 



N 0 



r 

3 = 1 



(5.24) 



It is also straightforward to show that E\r : jn r (t)\s,] = 0 for any t, 0 < t < T. Thus, since rj conditioned on s* 
and n r (t) arc Gaussian and uncorrelated, they arc independent. Also, since the transmitted signal is independent 
of the noise, s t j is independent of the process n r {t). 

We now discuss the receiver design criterion and show it is not affected by discarding n r (t). The goal of 
the receiver design is to minimize the probability of error in detecting the transmitted message m , given received 
signal r(t). To minimize P e = p(m / rrii\r(t)) = 1 — p(m = rrii\r(t )) we maximize p(m = rrii\r(t)). 
Therefore, the receiver output rh given received signal r(t) should coiTespond to the message m r that maximizes 
p(rrii sent|r(£)). Since there is a one-to-one mapping between messages and signal constellation points, this is 
equivalent to maximizing p( s* sent|r(t)). Recalling that r(t) is completely described by r = (n, . . . ,vn) and 
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n r (t), we have 



p(si sent|r(f)) = p((sn , . . . , s iN ) sent|(n, . . . , r N , n r (t)) 
p((sn, . . . ,SiN ) sent, (n, . . . ,rN),n r (t )) 
p((n,...,r N ),n r (t)) 
p((sn, • ■ ■ , s iN ) sent, (n, . . . , r N ))p(n r (t) 

Pin, • • • ,r N )p{n r {t)) 

= p((sn,.. . ,s iN ) sent\(n, . . . ,r N )), (5.25) 

where the third equality follows from the fact that the n r (t) is independent of both (n, . . . , rjv) and of (sji, . . . , Sijv) 
This analysis shows that (n, . . . , rjv) is a sufficient statistic for r(t) in detecting m r , in the sense that the proba- 
bility of error is minimized by using only this sufficient statistic to estimate the transmitted signal and discarding 
the remainder noise. Since r is a sufficient statistic for the received signal r(f), we call r the received vector 
associated with r(t). 



5.1.4 Decision Regions and the Maximum Likelihood Decision Criterion 

We saw in the previous section that the optimal receiver minimizes error probability by selecting the detector output 
rh that maximizes 1 — P e = p(m sent|r). In other words, given a received vector r, the optimal receiver selects 
rh = rrii corresponding to the constellation s j that satisfies p(s ? ; sent|r) > p(sj sent|r)Vj / i. Let us define a set 
of decisions regions (Z . . . , Zm) that are subsets of the signal space 72 N by 



Zi = (r : p(si sentjr) > p(sj sent|r)Vj / i). 



(5.26) 



Clearly these regions do not overlap. Moreover, they partition the signal space assuming there is no r G 72 v for 
which p(si sentjr) = p(sj sentjr). If such points exist then the signal space is partitioned with decision regions 
by arbitrarily assigning such points to either decision region Z, or Zj . Once the signal space has been partitioned 
by decision regions, then for a received vector r e Zi the optimal receiver outputs the message estimate rh = rn t . 
Thus, the receiver processing consists of computing the received vector r from r(t), finding which decision region 
Zi contains r, and outputting the corresponding message rn t . This process is illustrated in Figure 5.6, where we 
show a two-dimensional signal space with four decision regions Z \ , . . . , Z\ corresponding to four constellations 
Si, . . . , S 4 . The received vector r lies in region Z\, so the receiver will output the message m\ as the best message 
estimate given received vector r. 

We now examine the decision regions in more detail. We will abbreviate p(s, sentjr received) as p(s,|r) and 
p(si sent) as p(sj). By Bayes rule. 



_ p(r\sj)p(sj) 
P( r) 



(5.27) 



To minimize error probability, the receiver output rh = mi corresponds to the constellation s* that maximizes 
p(sj|r), i.e. Si must satisfy 



arg max 

Si 



p(r\sj)p(sj) 
P( r) 



arg maxp(r |sj)p(sj) , i = 1, . . . ,M, 

S i 



(5.28) 



where the second equality follows from the fact that p( r) is not a function of s j. Assuming equally likely messages 
( p(sj ) = 1/M), the receiver output rh = in, corresponding to the constellation s, that satisfies 



argmaxp(r|sj), i = 1, . . . , M. 

Si 



(5.29) 
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Figure 5.6: Decision Regions 



Let us define the likelihood function associated with our receiver as 

L(s.i) = p(r\si). (5.30) 

Given a received vector r, a maximum likelihood receiver outputs m = corresponding to the constellation 
s i that maximizes L(sj). Since the log function is increasing in its argument, maximizing L(s*) is equivalent to 
maximizing the log likelihood function, defined as Z(sj) = log L(sj). Using (5.24) for L(sj) = p(r|sj) then yields 



K s i) = -%') 2 = ll r - s il| 2 - (5 .3 i) 

No o=i 

Thus, the log likelihood function Z(s*) depends only on the distance between the received vector r and the constel- 
lation point 

The maximum likelihood receiver is implemented using the structure shown in Figure 5.4. First r is computed 
from r{t), and then the signal constellation closest to r is determined as the constellation point s , satisfying 



arg min 

Si 



1 

No 



N 

x>* 

3=1 




= arg min 

Si 




(5.32) 



This Sj is determined from the decision region Z % that contains r , where Z % is defined by 



Zi = (r : | |r — Si|| < ||r-Sj|| V? = ± i) i = (5.33) 

Finally, the estimated constellation s, is mapped to the estimated message rh, which is output from the receiver. 
This result is intuitively satisfying, since the receiver decides that the transmitted constellation point is the one 
closest to the received vector. This maximum likelihood receiver structure is very simple to implement since the 
decision criterion depends only on vector distances. This structure also minimizes the probability of message 
error at the receiver output when the transmitted messages arc equally likely. However, if the messages and cor- 
responding signal constellatations are not equally likely then the maximum likelihood receiver does not minimize 



error probability: to mi ni miz e error probability the decision regions Z, must be modified to take into account the 
message probabilities, as indicated in (5.27). 

An alternate receiver structure is shown in Figure 5.7. This structure makes use of a bank of filters matched 
to each of the different basis function. We call a filter with impulse response ij)(t) = 4>(T — t) , 0 < t < T the 
matched filter to the signal (f>{t), so Figure 5.7 is also called a matched filter receiver. It can be shown that 
if a given input signal is passed through a filter matched to that signal, the output SNR is maximized. One can 
also show that the sampled matched filter outputs (r i, . . . , r n ) in Figure 5.7 arc the same as the (n, . . . , r n ) in 
Figure 5.4, so the two receivers arc equivalent. 



Example 5.2: 

For BPSK modulation, find decision regions Z\ and Z 2 corresponding to constellations s\ = A and s 2 = — A . 
Solution: The signal space is one-dimensional, so r G 17. By (5.33) the decision region Z\ C 77. is defined by 

Z\ = (r : ||r — A|| < ||r - (— A)||) = (r : r > 0). 

Thus, Z\ contains all positive numbers on the real line. Similarly 

Z 2 = (r : ||r — (— A)|| < ||r — A||) = (r : r < 0). 

So Z 2 contains all negative numbers on the real line. For r = 0 the distance is the same to si = A and s 2 = —A 
so we arbitrarily assign r = 0 to Z 2 . 




Figure 5.7: Matched Filter Receiver Structure. 
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5.1.5 Error Probability and the Union Bound 



We now analyze the error probability associated with the maximum likelihood receiver structure. For equally likely 
messages sent) = 1/M we have 

M 

P e = r 0 Z x | m, sent )p(rrii sent) 

1=1 

1 M 

= — 0 Zi|mjsent) 

= 1 - 
= 1 - 
= 1 - 
= 1 - 

The integrals in (5.34) are over the A' -dimensional subset Z, c 1Z N . We illustrate this error probability calculation 
in Figure 5.8, where the constellation points si, . . . , sg are equally spaced around a circle with minimum separation 
dmin- The probability of correct reception assuming the first symbol is sent, p( r 6 Z j | m | sent), corresponds to 
the probability p(r = si + n|si) that when noise is added to the transmitted constellation si, the resulting vector 
r = si + n remains in the Z\ region shown by the shaded area. 



i = 1 



M 



z i\m sent) 

1=1 

1 M f 

y / p(r\rrii)dr 

8=1 ' Z i 



M 



1 

M 



1 



M .. 

e L v{ 

i= 1 



r = Si + n|sj)dn. 



M 



-Y: P(n)dn 

i=l J 



(5.34) 






Figure 5.8: Error Probability Integral and Its Rotational/Shift Invariance 

Figure 5.8 also indicates that the error probability is invariant to an angle rotation or axis shift of the sig- 
nal constellation. The right side of the figure indicates a phase rotation of 9 and axis shift of P relative to the 
constellation on the left side. Thus, s' = + P. The rotational invariance follows because the noise vector 

n = (n \ , . . . , n ,y ) has components that arc i.i.d Gaussian random variables with zero mean, thus the polar repre- 
sentation n = \n\e j0 has 9 uniformly distributed, so the noise statistics arc invariant to a phase rotation. The shift 
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invariance follows from the fact that if the constellation is shifted by some value P £ 72 A ", the decision regions 
defined by (5.33) are also shifted by P. Let (s*, Z, ) denote a constellation point and corresponding decision region 
before the shift and (s' , Z-) denote the corresponding constellation point and decision region after the shift. It is 
then straightforward to show that p( r = s t + n £ Zj\s t ) = p( r' = s' + n e zl |s'). Thus, the error probability 
after an axis shift of the constellation points will remain unchanged. 

While (5.34) gives an exact solution to the probability of error, we cannot solve for this error probability in 
closed form. Therefore, we now investigate the union bound on error probability, which yields a closed form 
expression that is a function of the distance between signal constellation points. Let ,4 ik denote the event that 
1 1 r — Sfe 1 1 < 1 1 r — Si 1 1 given that the constellation point s, was sent. If the event A jj. occurs, then the constellation 
will be decoded in error since the transmitted constellation s j is not the closest constellation point to the received 
vector r. However, event A;j. does not necessarily imply that s/ ; . will be decoded instead of Sj, since there may be 
another constellation point s j with ||r — si\\ < | |r — s* 1 1 < ||r — Sj||. The constellation is decoded correctly if 
1 1 r — s* 1 1 < ||r — Sfc|| Vfc ^ i. Thus 



P e {m.i sent) = p 



Im \ 

U A ik 

k = 1 

\Mi 



M 

< ’ 
k= 1 

k^i 



(5.35) 



where the inequality follows from the union bound on probability. 
Let us now consider p{A^) more closely. We have 



p{A ik ) = p(||sfc — r 1 1 < ||sj — r|| |sj sent) 

= P{\\sk ~ (sj + n)|| < ||sj - (sj + n)||) 

= P(|| n + Sj - Sfc|| < | |n| |), (5.36) 



i.e. the probability of error equals the probability that the noise n is closer to the vector s, — s/, than to the origin. 
Recall that the noise has a mean of zero, so it is generally close to the origin. This probability does not depend 
on the entire noise component n: it only depends on the projection of n onto the line connecting the origin and 
the point s,; — s k , as shown in Figure 5.9. Given the properties of n, the projection of n onto this one-dimensional 
line is a one dimensional Gaussian random variable n with mean and variance No/2. The event A ik occurs if n is 
closer to s,; — s k than to zero, i.e. if n > dik/ 2, where di k = ||sj — Sfe| | equals the distance between constellation 
points s* and s k . Thus, 



poo 

p(A ik ) = p{n > d ik / 2) = / 

-'chfc/2 



\AnNT ( 



: exp 



— v 

. WcT 



21 



dv = Q 



dik 

V^Nq 



(5.37) 



Substituting into (5.35) we get 



M 

P e (rrii sent) < 



k = 1 

k^i 




(5.38) 



where the 
variance 1 



Q function, Q(z), is defined as the probability that a Gaussian random variable x with mean 0 and 
is bigger than z: 



Q(z) = p(x > z) 





/ 2 dx . 



(5.39) 
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Figure 5.9: Noise Projecion 



Summing (5.38) over all possible messages yields the union bound 



Pe = ^rp(mi)P e (mi sent) < — ^ Q 



r 



(5.40) 



Note that the Q function cannot be solved for in closed form. It can be obtained from the complementary error 
function as 

Q( z ) = Jerfc ( A= V (5-41) 



We can upper bound Q(z) with the closed form expression 



GW < -4pe'“ V2 , 

ZyJZTT 



(5.42) 



and this bound is quite tight for z » 0. 

Defining the minimum distance of the constellation as d m i n = min., /. dy~, we can simplify (5.40) with the 
looser bound 

P c <(M-l)Q(-%g=y (5.43) 



v/2No 



Using (5.42) for the Q function yields a closed-form bound 



M — 1 \-dl 



(5.44) 



Finally, P e is sometimes approximated as the probability of error associated with constellations at the minimum 
distance d m in multiplied by the number of neighbors at this distance A/j miri : 



x/2No 



This approximation is called the nearest neighbor approximation to P e . When different constellation points have 
a different number of nearest neighbors or different minimum distances, the bound can be averaged over the bound 
associated with each constellation point. Note that the nearest neighbor approximation will always be less than 
the loose bound (5.43) since M > Md min , and will also be slightly less than the union bound (5.40), since this 
approximation does not include the error associated with constellations farther apart than the minimum distance. 
However, the nearest neighbor approximation is quite close to the exact probability of symbol error at high SNRs, 
since for x and y large with x > y, Q(x) « Q(y) due to the exponential falloff of the Gaussian distribution in 
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(5.39). This indicates that the probability of mistaking a constellation point for another point that is not one of its 
nearest neighbors is negligible at high SNRs. A rigorous derivation for (5.45) is made in [5] and also referenced in 
[6]. Moreover, [5] indicates that (5.45) captures the performance degradation due to imperfect receiver conditions 
such as slow carrier drift with an appropriate adjustment of the constants. The appeal of the nearest neighbor bound 
is that it depends only on the minimum distance in the signal constellation and the number of nearest neighbors for 
points in the constellation. 



Example 5.3: 

Consider a signal constellation in TZ 2 defined by si = (A, 0), S 2 = (0,A), S3 = (— A, 0) and S4 = (0,— A). 
Assume A/^/Nq = 4. Find the minimum distance and the union bound (5.40), looser bound (5.43), closed form 
bound (5.44), and nearest neighbor approximation (5.45) on P e for this constellation set. 



Solution: The constellation is as depicted in Figure 5.3 with the radius of the circle equal to A. By symmetry, we 
need only consider the error probability associated with one of the constellation points, since it will be the same 
for the others. We focus on the error associated with transmitting constellation point s \ . The minimum distance 
to this constellation point is easily computed as d m i n = d \2 = CZ23 = ^34 = di4 = V Ai 2 + A 2 = \/2A 2 . The 
distance to the other constellation points are d 13 = (I 24 = 2 A. By symmetry, P e (rrii sent) = P e (rrij sent), j / i, 
so the union bound simplifies to 



Pe<Y.Q 

j= 2 



A 3 



vm 



= 2Q(A/v / Ao) + Q{V2A/^No) = 2Q(4) + Q(V32) = 3.1679 * 10“ 5 . 



The looser bound yields 

Pe < 3Q(4) = 9.5014 * 10“ 5 

which is roughly a factor of 3 looser than the union bound. The closed-form bound yields 



Pe < —exp 
7 r 



— ,5A 2 
N 0 



= 3.2034* 10 



-4 



which differs from the union bound by about an order of magnitude. Finally, the nearest neighbor approximation 
yields 

P e « 2 Q(4) = 3.1671 * 10“ 5 , 



which, as expected, is approximately equal to the union bound. 



Note that for binary modulation where M = 2, there is only one way to make an error and d rmn is the distance 
between the two signal constellation points, so the bound (5.43) is exact: 

<5 - 46) 

The minimum distance squared in (5.44) and (5.46) is typically proportional to the SNR of the received signal, as 
discussed in Chapter 6. Thus, error probability is reduced by increasing the received signal power. 

Recall that P e is the probability of a symbol (message) error: P e = p(m / to,; | to,; sent), where m; corresponds 
to a message with log 2 M bits. However, system designers arc typically more interested in the bit error probability, 
also called the bit error rate (BER), than in the symbol error probability, since bit errors drive the performance 
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of higher layer networking protocols and end-to-end performance. Thus, we would like to design the mapping of 
the M possible bit sequences to messages i = 1 , ,M so that a symbol error associated with an adjacent 
decision region, which is the most likely way to make an error, corresponds to only one bit error. With such a 
mapping, assuming that mistaking a signal constellation for a constellation other than its nearest neighbors has a 
very low probability, we can make the approximation 



Pb 



Pe 

log 2 M ' 



(5.47) 



The most common form of mapping with the property is called Gray coding, which is discussed in more detail 
in Section 5.3. Signal space concepts arc applicable to any modulation where bits arc encoded as one of several 
possible analog signals, including the amplitude, phase, and frequency modulations discussed below. 



5.2 Passband Modulation Principles 

The basic principle of passband digital modulation is to encode an information bit stream into a carrier signal which 
is then transmitted over a communications channel. Demodulation is the process of extracting this information bit 
stream from the received signal. Corruption of the transmitted signal by the channel can lead to bit errors in the 
demodulation process. The goal of modulation is to send bits at a high data rate while minimizing the probability 
of data corruption. 

In general, modulated carrier signals encode information in the amplitude a(t), frequency /(f), or phase 6{t) 
of a carrier signal. Thus, the modulated signal can be represented as 

s(t) = a(t ) cos[ 27 t(/ c + /(f))f + 0(f) + 0 O ] = a(t ) cos(27r/ c f + 0(f) + 0 O ), (5.48) 

where 0(f) = 2nf(t)t + 0(f) and 0o is the phase offset of the carrier. This representation combines frequency and 
phase modulation into angle modulation. 

We can rewrite the right-hand side of (5.48) in terms of its in-phase and quadrature components as: 

s(t) = a(t) cos /(f) cos(27r/ c f) — a(f) sin 0(f) sin(27r/ c f) = s/(f) cos(27r/ c f) — sg(f) sin(27r/ c f), (5.49) 

where sj(t) = a(t) cos 0(f) is called the in-phase component of s(t) and sq{t) = a(f) sin 0(f) is called its 
quadrature component. We can also write s(t) in its complex baseband representation as 

s(f) = (5.50) 

where u(t) = s/(f) +./so(f). This representation, described in more detail in Appendix A, is useful since receivers 
typically process the in-phase and quadrature signal components separately. 

5.3 Amplitude and Phase Modulation 

In amplitude and phase modulation the information bit stream is encoded in the amplitude and/or phase of the 
transmitted signal. Specifically, over a time interval of T s , K = log 2 M bits are encoded into the amplitude and/or 
phase of the transmitted signal s(f), 0 < f < T s . The transmitted signal over this period s(t) = sj(t) cos(27r/ c f) — 
sq(/) sin(27r/ c f) can be written in terms of its signal space representation as s(t) = Sji0i(i) + •‘’7202(f) with basis 
functions 0i(f) = g(t ) cos(27r/ c f + 0o) and 0 2 (f) = — g(t) sin(27r/ c f + 0o), where g(t) is a shaping pulse. To 
send the ith message over the time interval [kT, ( k + 1 )T), we set s/(f) = sng{t) and -SQ{t) = Sj 2 fif(f). These 
in-phase and quadrature signal components arc baseband signals with spectral characteristics determined by the 
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pulse shape g(t). In particular, their bandwidth B equals the bandwidth of g(t), and the transmitted signal s(t) is 
a passband signal with center frequency f c and passband bandwidth 2 B. In practice we take B = K g /T s where 
K g depends on the pulse shape: for rectangular pulses K g = .5 and for raised cosine pulses .5 < K g < 1, as 
discussed in Section 5.5. Thus, for rectangular pulses the bandwidth of g(t) is .5 /T s and the bandwidth of s(t) is 
1/T S . Since the pulse shape g(t) is fixed, the signal constellation for amplitude and phase modulation is defined 
based on the constellation point: (sji, s^) £ 1Z 2 ,i = 1, . . . , M. The complex baseband representation of s(t) is 

s{t) = ^{x{t)e j<t>0 e j ^ f ^} (5.51) 

where x(t) = -sj(t) + jsQ(t) = (sji + jsi 2 )g{t). The constellation point s* = (sn,Si 2 ) is called the symbol 
associated with the log 2 M bits and T s is called the symbol time. The bit rate for this modulation is K bits per 
symbol or R = log 2 M/T s bits per second. 

There are three main types of amplitude/phase modulation: 

• Pulse Amplitude Modulation (MPAM): information encoded in amplitude only. 

• Phase Shift Keying (MPSK): information encoded in phase only. 

• Quadrature Amplitude Modulation (MQAM): information encoded in both amplitude and phase. 

The number of bits per symbol K = log 2 M, signal constellation (sn, s^) G 1Z 2 , i = 1, . . . , M, and choice of 
pulse shape g(t) determines the digital modulation design. The pulse shape g(t) is designed to improve spectral 
efficiency and combat ISI, as discussed in Section 5.5 below. 

Amplitude and phase modulation over a given symbol period can be generated using the modulator structure 
shown in Figure 5.10. Note that the basis functions in this figure have an arbitrary phase ©n associated with 
the transmit oscillator. Demodulation over each symbol period is performed using the demodulation structure 
of Figure 5.11, which is equivalent to the structure of Figure 5.7 for cpi (t) = g(t) cos(27r/ c f + </>) and fait) = 
—g(t) sin(27r/ c f + ©). Typically the receiver includes some additional circuitry for carrier phase recovery that 
matches the carrier phase 0 at the receiver to the carrier phase ©o at the transmitter 1 , which is called coherent 
detection. If </>— = A (f> / 0 then the in-phase branch will have an unwanted term associated with the quadrature 
branch and vice versa, i.e. r\ = sn cos(Ac f>) + s & sin( A^>) + ri\ and r 2 = sn sin(A</>) + s *2 cos(A0) + ri 2 , which 
can result in significant performance degradation. The receiver structure also assumes that the sampling function 
every T s seconds is synchronized to the start of the symbol period, which is called synchronization or timing 
recovery. Receiver synchronization and carrier phase recovery arc complex receiver operations that can be highly 
challenging in wireless environments. These operations are discussed in more detail in Section 5.6. We will assume 
perfect carrier recovery in our discussion of MPAM, MPSK and MQAM, and therefore set 0 = ©o = 6 for their 
analysis. 

5.3.1 Pulse Amplitude Modulation (MPAM) 

We will start by looking at the simplest form of 1 i near modulation, one-dimensional MPAM, which has no quadra- 
ture component (.s ©2 = 0). For MPAM all of the information is encoded into the signal amplitude .4,. The 
transmitted signal over one symbol time is given by 

Si(t) = K{ A i g(t)e j2 ' ,Tfct } = Aig(t) cos(27r f c t), 0 < t < T s » 1 / f c , (5.52) 

where Aj = (2i — 1 — M)d, i = 1, 2, . . . , Af defines the signal constellation, parameterized by the distance d which 
is typically a function of the signal energy, and g{t) is the pulse shape satisfying (5.12) and (5.13). The minimum 

'in fact, an additional phase term of —2nf c T will result from a propagation delay of r in the channel. Thus, coherent detection requires 
the receiver phase 0 = 00 — 2irf c T , as discussed in more detail in Section 5.6. 
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In-Phase branch 




Quadrature Branch 



Figure 5.10: Amplitude/Phase Modulator. 

distance between constellation points is d m i n = min ij A, — Aj\ = 2d. The amplitude of the transmitted signal 
takes on M different values, which implies that each pulse conveys log 2 M = K bits per symbol time T s . 

Over each symbol period the MPAM signal associated with the ith constellation has energy 

rT s r-T a 

E Si = / sf(t)dt = / A^p 2 (t) cos 2 (2nf c t)dt = A? (5.53) 

Jo Jo 

since the pulse shape must satisfy (5.12) 2 . Note that the energy is not the same for each signal Si(t),i = 1, . . . , M. 
2 Recall from (5.8) that (5.12) and therefore (5.53) are not necessarily exact equalities, but very good approximations for f c T s » 1. 



In-Phase branch 



T s 




Quadrature branch 



Figure 5.11: Amplitude/Phase Demodulator (Coherent: (j) = 4>q). 
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Assuming equally likely symbols, the average energy is 




i = 1 



(5.54) 



The constellation mapping is usually done by Gray encoding, where the messages associated with signal 
amplitudes that arc adjacent to each other differ by one bit value, as illustrated in Figure 5.12. With this encoding 
method, if noise causes the demodulation process to mistake one symbol for an adjacent one (the most likely type 
of error), this results in only a single bit error in the sequence of K bits. Gray codes can be designed for MPSK 
and square MQAM constellations, but not rectangular MQAM. 



M=4, K=2 

00 01 11 10 

• • • • 

► 

2d 



M=8, K=3 

000 001 011 010 110 111 101 100 

• • • • • • • • 

► 

2d 

Figure 5.12: Gray Encoding for MPAM. 



Example 5.4: 

For g(t) = y/2/T s , 0 < t < T s a rectangular pulse shape, find the average energy of 4PAM modulation. 
Solution: For 4PAM the A* values are A, = {—3d, —d, d, 3d}, so the average energy is 

, f 

E s = —(9 + 1 + 1 + 9) = 5d 2 . 



The decision regions Zi,i = 1, ... ,M associated with the pulse amplitude A, = (2i — 1 — M)d for M = 4 
and M = 8 are shown in Figure 5.13. Mathematically, for any M, these decision regions are defined by 



Zi = 



(-oo, Ai + d) 
[Aj — d, Ai + d) 
[Ai - d, oo) 



1 = 1 

2 < i < M - 1 
i = M 



From (5.52) we see that MPAM has only a single basis function 4>i(t) = g(t) cos(27r/ c f). Thus, the coherent 
demodulator of Figure 5.11 for MPAM reduces to the demodulator shown in Figure 5.14, where the multithreshold 
device maps x to a decision region Z % and outputs the corresponding bit sequence rh = rrii = {bi, . . . , b k } . 
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Figure 5.13: Decision Regions for MPAM 



Multithreshold Device 




Figure 5.14: Coherent Demodulator for MPAM 



5.3.2 Phase Shift Keying (MPSK) 

For MPSK all of the information is encoded in the phase of the transmitted signal. Thus, the transmitted signal 
over one symbol time is given by 

Si {t) = &{Ag(t)tj 2 < i - 1 V M e i2 * fet }, 0 <t<T s 

. \ 2tt(i — 1) 

= Ag(t) cos 2vr f c t H — 

= Ag(t) cos 27T ^ m ^ cos 2irf c t - Ag(t) sin ^ sm2n f c t. (5.55) 

Thus, the constellation points or symbols (sji, 5 * 2 ) are given by sa = Acos[ 27r ^~ 1 ^ ] and Si 2 = Asin[ 27r ^ 1 ^ ] for 
i = 1, . . . , M. The pulse shape g(t) satisfies (5.12) and (5.13), and = ~ 7r ^ 1 ^ , i = 1, 2, . . . , M = 2 h are the 
different phases in the signal constellation points that convey the information bits. The minimum distance between 
constellation points is d m in = 2.4 sin(7r/A/), where A is typically a function of the signal energy. 2PSK is often 
referred to as binary PSK or BPSK, while 4PSK is often called quadrature phase shift keying (QPSK), and is the 
same as MQAM with M = 4 which is defined below. 
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All possible transmitted signals Si(t) have equal energy: 



E Si = f 8 si(t)dt = A 2 (5.56) 

Jo 

Note that for g(t) = \/‘2/T s . 0 < t < T s , i.e. a rectangular pulse, this signal has constant envelope, unlike the other 
amplitude modulation techniques MPAM and MQAM. However, rectangular pulses are spectrally-inefficient, and 
more efficient pulse shapes make MPSK nonconstant envelope. As for MPAM, constellation mapping is usually 
done by Gray encoding, where the messages associated with signal phases that arc adjacent to each other differ by 
one bit value, as illustrated in Figure 5.15. With this encoding method, mistaking a symbol for an adjacent one 
causes only a single bit error. 
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Figure 5.15: Gray Encoding for MPSK. 

The decision regions Z , , i = 1, . . . , M, associated with MPSK for M = 8 arc shown in Figure 5.16. If we 
represent r = re jl> G 72 2 in polar coordinates then these decision regions for any M arc defined by 

Zi = {re je : 2vr (i - .5 )/M <8 < 2t < r(i + .5 )/M}. (5.57) 

From (5.55) we see that MPSK has both in-phase and quadrature components, and thus the coherent demodulator 
is as shown in Figure 5.11. For the special case of BPSK, the decision regions as given in Example 5.2 simplify to 
Z\ = (r : r > 0) and Z 2 = (r : r < 0). Moreover BPSK has only a single basis function 4>i(t) = g(t) cos(27r f c t) 
and, since there is only a single bit transmitted per symbol time T s , the bit time 7), = T s . Thus, the coherent 
demodulator of Figure 5.11 for BPSK reduces to the demodulator shown in Figure 5.17, where the threshold 
device maps x to the positive or negative half of the real line, and outputs the corresponding bit value. We have 
assumed in this figure that the message corresponding to a bit value of 1, mi = 1, is mapped to constellation point 
si = A and the message corresponding to a bit value of 0, m 2 = 0, is mapped to the constellation point S 2 = —A. 

5.3.3 Quadrature Amplitude Modulation (MQAM) 

For MQAM, the information bits are encoded in both the amplitude and phase of the transmitted signal. Thus, 
whereas both MPAM and MPSK have one degree of freedom in which to encode the information bits (amplitude 
or phase), MQAM has two degrees of freedom. As a result, MQAM is more spectrally-efficient than MPAM and 
MPSK, in that it can encode the most number of bits per symbol for a given average energy. 

The transmitted signal is given by 

Si(t) = $t{Aie jei g(t)e j27Tfct } = A 4 cos(8i)g(t) cos(27r f c t) - A* sin (8i)g{t) sin(27r f c t), 0 <t<T s . (5.58) 
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Figure 5.16: Decision Regions for MPSK 



Threshold Device 




COS(27lf c t) 

Figure 5.17: Coherent Demodulator for BPSK. 



where the pulse shape g(t) satisfies (5.12) and (5.13). The energy in Si(t) is 

E Si = [ TS ■sj(t) = A l 
Jo 

the same as for MPAM. The distance between any pair of symbols in the signal constellation is 

dij — 1 1 Sj Sjf 1 1 — \J (Sji Sjl)" + ( Si2 Sj2) 2 • 



(5.59) 



(5.60) 



For square signal constellations, where and s *2 take values on (2i — 1 — L)d, i = 1,2 ,L = 2 l , the 
minimum distance between signal points reduces to d m in = 2 d, the same as for MPAM. In fact, MQAM with 
squai'e constellations of size L 2 is equivalent to MPAM modulation with constellations of size L on each of the 
in-phase and quadrature signal components. Common squai'e constellations are 4QAM and 16QAM, which are 
shown in Figure 5.18 below. These squai'e constellations have M = 2 21 = L 2 constellation points, which are 
used to send 21 bits/symbol, or l bits per dimension. It can be shown that the average power of a squai'e signal 
constellation with l bits per dimension. Si, is proportional to 4 z /3, and it follows that the average power for one 
more bit per dimension ,S ) + 1 ~ 4,S). Thus, for squai'e constellations it takes approximately 6 dB more power 
to send an additional 1 bit/dimension or 2 bits/symbol while maintaining the same minimum distance between 
constellation points. 

Good constellation mappings can be hard to find for QAM signals, especially for irregular constellation 
shapes. In particular, it is hard to find a Gray code mapping where all adjacent symbols differ by a single bit. 
The decision regions Z t , i = 1, . . . , M, associated with MQAM for M = 16 are shown in Figure 5.19. From 
(5.58) we see that MQAM has both in-phase and quadrature components, and thus the coherent demodulator is as 
shown in Figure 5.11. 
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# 4-QAM ■ 16-QAM 

Figure 5.18: 4QAM and 16QAM Constellations. 

5.3.4 Differential Modulation 

The information in MPSK and MQAM signals is carried in the signal phase. Thus, these modulation techniques 
require coherent demodulation, i.e. the phase of the transmitted signal carrier do must be matched to the phase of 
the receiver carrier d- Techniques for phase recovery typically require more complexity and cost in the receiver 
and they are also susceptible to phase drift of the carrier. Moreover, obtaining a coherent phase reference in a 
rapidly fading channel can be difficult. Issues associated with carrier phase recovery are dicussed in more detail 
in Section 5.6. Due to the difficulties as well as the cost and complexity associated with carrier phase recovery, 
differential modulation techniques, which do not require a coherent phase reference at the receiver, are generally 
preferred to coherent modulation for wireless applications. 

Differential modulation falls in the more general class of modulation with memory, where the symbol trans- 
mitted over time [kT s , (k + 1)T S ) depends on the bits associated with the current message to be transmitted and 
the bits transmitted over prior symbol times. The basic principle of differential modulation is to use the previous 
symbol as a phase reference for the current symbol, thus avoiding the need for a coherent phase reference at the 
receiver. Specifically, the information bits are encoded as the differential phase between the current symbol and the 
previous symbol. For example, in differential BPSK, referred to as DPSK, if the symbol over time \(k — 1)T S , kT s ) 
has phase 9(k — 1) = e^ Si , 9{ = 0, ir, then to encode a 0 bit over [kT s , ( k + 1)T S ), the symbol would have phase 
9{k) = e^ 6i and to encode a 1 bit the symbol would have phase 9{k) = e^ i+7r . In other words, a 0 bit is encoded 
by no change in phase, whereas a 1 bit is encoded as a phase change of ir. Similarly, in 4PSK modulation with 
differential encoding, the symbol phase over symbol interval [kT s , ( k + 1 )T S ) depends on the current information 
bits over this time interval and the symbol phase over the previous symbol interval. The phase transitions for 
DQPSK modulation are summarized in Table 5.1. Specifically, suppose the symbol over time [( k — 1 )T Sl kT s ) 
has phase 9(k — 1) = Then, over symbol time [kT s , (k + 1 ) T s ) , if the information bits are 00, the cor- 
responding symbol would have phase 9(k) = e J<>t , i.e. to encode the bits 00, the symbol from symbol interval 
[(k — 1 ) T s . kT s ) is repeated over the next interval [kT s , (k + 1)T S ). If the two information bits to be sent at time 
interval [kT s , (k + 1)T S ) arc 01, then the corresponding symbol has phase 9(k) = e : ^ <h ' +7T ''~ ! . For information bits 
10 the symbol phase is 9(k) = e^ di ~ n ^ 2 \ and for information bits 11 the symbol phase is 9{n) = e : ^° h+n k We 
see that the symbol phase over symbol interval [kT s , (/:: + 1 )T S ) depends on the current information bits over this 
time interval and the symbol phase 9, over the previous symbol interval. Note that this mapping of bit sequences 
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Figure 5.19: Decision Regions for MQAM with M = 16 



to phase transitions ensures that the most likely detection error, that of mistaking a received symbol for one of its 
nearest neighbors, results in a single bit error. For example, if the bit sequence 00 is encoded in the Mi symbol 
then the Mi symbol has the same phase as the (/;: — 1 ) t h symbol. Assume this phase is (),. The most likely detection 
error of the kth symbol is to decode it as one of its nearest neighbor symbols, which have phase 0 , ± ir/2. But 
decoding the received symbol with phase 6i±n/2 would result in a decoded information sequence of either 01 or 
10, i.e. it would differ by a single bit from the original sequence 00. More generally, we can use Gray encoding for 
the phase transitions in differential MPSK for any M, so that a message of all 0 bits results in no phase change, a 
message with a single 1 bit and the rest 0 bits results in the minimum phase change of 2n/M, a message with two 
1 bits and the rest 0 bits results in a phase change of Att/M, and so forth. Differential encoding is most common 
for MPSK signals, since the differential mapping is relatively simple. Differential encoding can also be done for 
MQAM with a more complex differential mapping. Differential encoding of MPSK is denoted by D-MPSK, and 
for BPSK and QPSK this becomes DPSK and D-QPSK, respectively. 



Bit Sequence 


Phase Transition 


00 


0 


01 


n/2 


10 


— 7T/2 


11 


7 r 



Table 5.1: Mapping for D-QPSK with Gray Encoding 



Example 5.5: 

Find the sequence of symbols transmitted using DPSK for the bit sequence 101110 stalling at the kth symbol time, 
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assuming the transmitted symbol at the (k — l)th symbol time was s(A: — 1) = Ae J1T . 

Solution: The first bit, a 1, results in a phase transition of 7 r, so s (k) = A. The next bit, a 0, results in no transition, 
so s (k + 1) = A. The next bit, a 1, results in another transition of 7 r, so s (k + 1) = Ae^ n , and so on. The full 
symbol sequence corresponding to 101110 is A, A, Ae^, A, Ae- ?7r , Ae^ n . 



The demodulator for differential modulation is shown in Figure 5.20. Assume the transmitted constellation at 
time k is s (k) = Ae j0(k > + 'A The received vector associated with the sampler outputs is 

z (k) = ri(A) + jr 2 {k ) = Ae^ e ^ +< ^° + n(k ), (5.61) 

where n{k) is complex white Gaussian noise. The received vector at the previous time sample A: — 1 is thus 

z (k — 1) = r\(k — 1) + jr 2 {k — 1) = Ae j d ^ k ~A+4>o q_ n ^ _ 1 ). (5.62) 

The phase difference between z (k) and z(k — 1) dictates which symbol was transmitted. Consider 

z{k)z*{k - 1) = ^2 e i(0(fc)-e(fc-i)) + Ae j9(k)+ *°n*{k - 1) + Ae- j9(k - 1)+(f,0 n{k ) + n{k)n*{k - 1). (5.63) 

In the absence of noise ( n(k ) = n(k — 1) = 0) only the first term in (5.63) is nonzero, and this term yields 
the desired phase difference. The phase comparator in Figure 5.20 extracts this phase difference and outputs the 
corresponding symbol. 



In-Phase branch 




Quadrature branch 



Figure 5.20: Differential PSK Demodulator. 

Differential modulation is less sensitive to a random drift in the carrier phase. However, if the channel has a 
nonzero Doppler frequency, the signal phase can decorrelate between symbol times, making the previous symbol 
a very noisy phase reference. This decorrelation gives rise to an irreducible error floor for differential modulation 
over wireless channels with Doppler, as we shall discuss in Chapter 6. 



5.3.5 Constellation Shaping 

Rectangular and hexagonal constellations have a better power efficiency than the square or circular constellations 
associated with MQAM and MPSK, respectively. These irregular constellations can save up to 1.3 dB of power at 
the expense of increased complexity in the constellation map [18]. The optimal constellation shape is a sphere in 
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iV- dimensional space, which must be mapped to a sequence of constellations in 2-dimensional space in order to be 
generated by the modulator shown in Figure 5.10. The general conclusion in [18] is that for uncoded modulation, 
the increased complexity of spherical constellations is not worth their energy gains, since coding can provide much 
better performance at less complexity cost. However, if a complex channel code is already being used and little 
further improvement can be obtained by a more complex code, constellation shaping may obtain around 1 dB of 
additional gain. An in-depth discussion of constellation shaping, as well as constellations that allow a noninteger 
number of bits per symbol, can be found in [18]. 

5.3.6 Quadrature Offset 

A lineally modulated signal with symbol s,; = (sa, sa) will lie in one of the four quadrants of the signal space. At 
each symbol time kT s the transition to a new symbol value in a different quadrant can cause a phase transition of 
up to 180 degrees, which may cause the signal amplitude to transition through the zero point: these abrupt phase 
transitions and large amplitude variations can be distorted by nonlinear amplifiers and filters. These abrupt transi- 
tions are avoided by offsetting the quadrature branch pulse g(t) by half a symbol period, as shown in Figure 5.21. 
This offset makes the signal less sensitive to distortion during symbol transitions. 

Phase modulation with phase offset is usually abbreviated as O-MPSK, where the O indicates the offset. For 
example, QPSK modulation with quadrature offset is referred to as O-QPSK. O-QPSK has the same spectral prop- 
erties as QPSK for 1 i near amplification, but has higher spectral efficiency under nonlinear amplification, since the 
maximum phase transition of the signal is 90 degrees, corresponding to the maximum phase transition in either the 
in-phase or quadrature branch, but not both simultaneously. Another technique to mitigate the amplitude fluctua- 
tions of a 180 degree phase shift used in the IS-54 standard for digital cellular is 7 t/ 4-QPSK [13]. This technique 
allows for a maximum phase transition of 135 degrees, versus 90 degrees for offset QPSK and 180 degrees for 
QPSK. Thus, 7 t/ 4-QPSK does not have as good spectral properties as O-QPSK under nonlinear amplification. 
However, 7 t/ 4-QPSK can be differentially encoded, eliminating the need for a coherent phase reference, which 
is a significant advantage. Using differential encoding with 7r/4-QPSK is called 7 t/ 4-DQPSK. The 7 t/ 4-DQPSK 
modulation works as follows: the information bits are first differentially encoded as in DQPSK, which yields one 
of the four QPSK constellation points. Then, every other symbol transmission is shifted in phase by 7 t/ 4. This pe- 
riodic phase shift has a s im ilar effect as the time offset in OQPSK: it reduces the amplitude fluctuations at symbol 
transitions, which makes the signal more robust against noise and fading. 

5.4 Frequency Modulation 

Frequency modulation encodes information bits into the frequency of the transmitted signal. Specifically, each 
symbol time K = log 2 M bits are encoded into the frequency of the transmitted signal s(t), 0 < t < T s , resulting 
in a transmitted signal Si(t) = Acos(2tt fit + <p t ), where i is the index of the 2th message corresponding to 
the log 2 M bits and 6;. is the phase associated with the fill carrier. The signal space representation is s t (t) = 
Yfj Sij4>j(t) where Sij = A5(i — j) and 4>j(i) = cos(27r fjt + (f> 3 ), so the basis functions correspond to carriers at 
different frequencies and only one such basis function is transmitted in each symbol period. The orthogonality of 
the basis functions requires a minimum separation between different carrier frequencies of A / = min rJ \ f 3 — f t | = 
•5 /T s . 

Since frequency modulation encodes information in the signal frequency, the transmitted signal s(t) has a 
constant envelope A. Because the signal is constant envelope, nonlinear amplifiers can be used with high power 
efficiency, and the modulated signal is less sensitive to amplitude distortion introduced by the channel or the 
hardware. The price exacted for this robustness is a lower spectral efficiency: because the modulation technique 
is nonlinear, it tends to have a higher bandwidth occupancy than the amplitude and phase modulation techniques 
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In-Phase branch 




s(t) 



Quadrature Branch 



Figure 5.21: Modulator with Quadrature Offset. 



described in Section 5.3. 

In its simplest form, frequency modulation over a given symbol period can be generated using the modulator 
structure shown in Figure 5.22. Demodulation over each symbol period is performed using the demodulation 
structure of Figure 5.23. Note that the demodulator of Figure 5.23 requires that the jth carrier signal be matched in 
phase to the jth carrier signal at the transmitter, similar to the coherent phase reference requirement in amplitude 
and phase modulation. An alternate receiver structure that does not require this coherent phase reference will be 
discussed in Section 5.4.3. Another issue in frequency modulation is that the different carriers shown in Figure 5.22 
have different phases, (p t / <i> 3 for i / j, so that at each symbol time T s there will be a phase discontinuity in the 
transmitted signal. Such discontinuities can significantly increase signal bandwidth. Thus, in practice an alternate 
modulator is used that generates a frequency modulated signal with continuous phase, as will be discussed in 
Section 5.4.2 below. 

5.4.1 Frequency Shift Keying (FSK) and Minimum Shift Keying (MSK) 

In MFSK the modulated signal is given 

Si{t) = A cos[2tt f c t + 27rccjA/ c t + fa], 0 < t < T s , (5.64) 

where o,; = (2i — 1 — M),i = 1,2, ... ,M = 2 K . The minimum frequency separation between FSK carriers 
is thus 2A f c . MFSK consists of M basis functions fa(t ) = \J2 /T s cos [2 7r f c t + 2iraiAf c t + fa], where the 
\J‘2 /T s is a normalization factor to insure that f^ s faf(t) = 1. Over a given symbol time only one basis function is 
transmitted through the channel. 

A simple way to generate the MFSK signal is as shown in Figure 5.22, where M oscillators are operating at 
the different frequencies /', = f c + cqA f c and the modulator switches between these different oscillators each 
symbol time T s . However, with this implementation there will be a discontinuous phase transition at the switching 
times due to phase offsets between the oscillators. This discontinuous phase leads to a spectral broadening, which 
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COS (27Cf M t+<|> M ) 



Figure 5.23: Frequency Demodulator (Coherent) 



is undesirable. An FSK modulator that maintains continuous phase is discussed in the next section. Coherent 
detection of MFSK uses the standard structure of Figure 5.4. For binary signaling the structure can be simplified 
to that shown in Figure 5.24, where the decision device outputs a 1 bit if its input is greater than zero and a 0 bit if 
its input is less than zero. 

MSK is a special case of FSK where the minimum frequency separation is 2A f c = .5 /T s . Note that this 
is the minimum frequency separation so that < s t (t), Sj(t) >= 0 over a symbol time, for i / j. Since signal 
orthogonality is required for demodulation, 2 A f c = .5 /T s is the minimum possible frequency separation in FSK, 
and therefore it occupies the minimum bandwidth. 



5.4.2 Continuous-Phase FSK (CPFSK) 

A better way to generate MFSK that eliminates the phase discontinuity is to frequency modulate a single carrier 
with a modulating waveform, as in analog FM. In this case the modulated signal will be given by 



Si(t) = A cos 



2nf c t + 2n/3 / u(t)cLt 



= Acos[27r/ c t + 9(t)}, 



(5.65) 
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Figure 5.24: Demodulator for FSK 



where u(t) = Ylk a k9(t ~ kT s ) is an MPAM signal modulated with the information bit stream, as described in 
Section 5.3.1. Clearly the phase 0(t) is continuous with this implementation. This form of MFSK is therefore 
called continuous phase FSK, or CPFSK. 

By Carson’s rule [1], for (3 small the transmission bandwidth of s{t) is approximately 

B s « MAf c + 2 B g , (5.66) 

where B g is the bandwidth of the pulse shape g(t ) used in the MPAM modulating signal u(t). By comparison, 
the bandwidth of a lineaiiy modulated waveform with pulse shape g(t ) is roughly B s ~ 2 B g . Thus, the spectral 
occupancy of a CPFSK-modulated signal is larger than that of a lineaiiy modulated signal by MAf c > .5 M/T s . 
The spectral efficiency penalty of CPFSK relative to linear modulation increases with data rate, in particular with 
the number of of bits per symbol K = log 2 M and with the symbol rate R s = 1/T S . 

Coherent detection of CPFSK can be done symbol-by-symbol or over a sequence of symbols. The sequence 
estimator is the optimal detector since a given symbol depends on previously transmitted symbols, and therefore it 
is optimal to detect all symbols simultaneously. However, sequence detection can be impractical due to the memory 
and computational requirements associated with making decisions based on sequences of symbols. Details on both 
symbol-by-symbol and sequence detectors for coherent demodulation of CPFSK can be found in [10, Chapter 5.3], 

5.4.3 Noncoherent Detection of FSK 

The receiver requirement for a coherent phase reference associated with each FSK carrier can be difficult and 
expensive to meet. The need for a coherent phase reference can be eliminated by detecting the energy of the signal 
at each frequency and, if the /th branch has the highest energy of all branches, then the receiver outputs message 
ra,. The modified receiver is shown in Figure 5.25. 

Suppose the transmitted signal corresponds to frequency f t : 

s(t ) = Acos(27r/jt + (j>i) = Acos((j>i) cos(27 r/,T) — Asin(</>j) sin(27r/jt), 0 < t < T s . (5.67) 

The phase d>i represents the phase offset between the transmitter and receiver oscillators at frequency /j. The 
coherent receiver in Figure 5.23 only detects the first term A cos (</q) cos (2ir fit) associated with the received signal, 
which can be close to zero for a phase offset (j> r ~ ±n/2. To get around this problem, in Figure 5.25 the receiver 
splits the received signal into M branches corresponding to each frequency fj,j = 1 , . . . , M . For each carrier 
frequency f :l . j = 1 , . . . , M, the received signal is multiplied by a noncoherent in-phase and quadrature carrier at 
that frequency, integrated over a symbol time, sampled, and then squared. For the jth branch the squarer output 
associated with the in-phase component is denoted as Ajj + n ? j and the corresponding output associated with the 
quadrature component is denoted as Ajq + rijQ, where riji and njQ arc due to the noise n(t) at the receiver input. 
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Figure 5.25: Noncoherent FSK Demodulator 



Then if i = j, Ajj = A 2 cos 2 (fa) and Ajq = A 2 sin 2 (0j). If i / j then Ajj = Ajq = 0. In the absence of 
noise, the input to the decision device of the ?'th branch will be ,4 2 cos (<j>i) + , 4 2 sin(</>j) = , 4 2 , independent of cj)i, 
and all other branches will have an input of zero. Thus, over each symbol period, the decision device outputs the 
bit sequence corresponding to frequency f :j if the jth branch has the largest input to the decision device. A similar 
structure where each branch consists of a filter matched to the carrier frequency followed by an envelope detector 
and sampler can also be used [2, Chapter 6.8]. Note that the noncoherent receiver of Figure 5.25 still requires 
accurate synchronization for sampling. Synchronization issues arc discussed in Section 5.6. 



5.5 Pulse Shaping 

For amplitude and phase modulation the bandwidth of the baseband and passband modulated signal is a function 
of the bandwidth of the pulse shape g(t). If g(t) is a rectangular pulse of width T s , then the envelope of the signal 
is constant. However, a rectangular pulse has very high spectral sidelobes, which means that signals must use a 
larger bandwidth to eliminate some of the adjacent channel sidelobe energy. Pulse shaping is a method to reduce 
sidelobe energy relative to a rectangular pulse, however the shaping must be done in such a way that intersymbol 
interference (ISI) between pulses in the received signal is not introduced. Note that prior to sampling the received 
signal the transmitted pulse g(t) is convolved with the channel impulse response c(t) and the matched filter g*(—t), 
so to eliminate ISI prior to sampling we must ensure that the effective received pulse p(t) = g(t ) * c(t) * g*(—t) 
has no ISI. Since the channel model is AWGN, we assume c(t) = 8(t) so p(t) = g(t) * g*(—t ): in Chapter 1 1 we 
will analyze ISI for more general channel impulse responses c(t). To avoid ISI between samples of the received 
pulses, the effective pulse shape p(t) must satisfy the Nyquist criterion, which requires the pulse equals zero at the 
ideal sampling point associated with past or future symbols: 

{r*°> t;: 

In the frequency domain this translates to 

OO 

P(f + l/T s )=p 0 T s . (5.68) 
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The following pulse shapes all satisfy the Nyquist criterion. 



1. Rectangular pulses: g(t) = \J2 /T s , 0 < t < T s , which yields the triangular effective pulse shape 

( 2 T- 2t/T s — / s A t <C 0 
p(t) = < 2 — 2 t/T s 0 < t < T s 
( 0 else 

This pulse shape leads to constant envelope signals in MPSK, but has lousy spectral properties due to its 
high sidelobes. 

2. Cosine pulses: p(t) = sin nt/T s . 0 < t < T s . Cosine pulses are mostly used in MSK modulation, where the 
quadrature branch of the PSK modulation has its pulse shifted by T s /2. This leads to a constant amplitude 
modulation with sidelobe energy that is 10 dB lower than that of rectangular pulses. 

3. Raised Cosine Pulses: These pulses arc designed in the frequency domain according to the desired spectral 
properties. Thus, the pulse p(t) is first specified relative to its Fourier Transform: 



P(f) 




0 < |/| < (1 - P)/2T a 

(1 “ P)/2T S < |/| < (1 + (5)/2T s ’ 



where 6 is defined as the rolloff factor, which determines the rate of spectral rolloff, as shown in Figure 5.26. 
Setting (5 = 0 yields a rectangular pulse. The pulse pit) in the time domain corresponding to P(f) is 



p(t) 



simrt/T s cos (5irt/T s 
irt/T s 1-4 (3 2 t 2 /Tf 



The time and frequency domain properties of the Raised Cosine pulse are shown in Figures 5.26-5.27. The tails 
of this pulse in the time domain decay as 1/t 3 (faster than for the previous pulse shapes), so a mistiming error in 
sampling leads to a series of intersymbol interference components that converge. A variation of the Raised Cosine 
pulse is the Root Cosine pulse, derived by taking the square root of the frequency response for the Raised Cosine 
pulse. The Root Cosine pulse has better spectral properties than the Raised Cosine pulse, but it decays less rapidly 
in the time domain, which makes performance degradation due to synchronization errors more severe. Specifically, 
a mistiming error in sampling leads to a series of intersymbol interference components that may diverge. 

Pulse shaping is also used with CPFSK to improve spectral efficiency, specifically in the MPAM signal that 
is frequency modulated to form the FSK signal. The most common pulse shape used in FSK is the Gaussian pulse 
shape, defined as 




a 



(5.69) 



where a is a parameter that dictates spectral efficiency. The spectrum 



FSK signal, is given by 



G(f) = e 



- c -« 2 / 2 



(5.70) 



The parameter a is related to the 3dB bandwidth of g(t), B z , by 



a = 



\J — In \/^5 

K 



(5.71) 



Clearly making a large results in a higher spectral efficiency. 
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Figure 5.27: Time-Domain Properties of the Raised Cosine Pulse. 



When the Gaussian pulse shape is applied to MSK modulation, it is abbreviated as GMSK. In general GMSK 
signals have a high power efficiency since they have a constant amplitude, and a high spectal efficiency since 
the Gaussian pulse shape has good spectral properties for large a. For this reason GMSK is used in the GSM 
standard for digital cellular systems. Although this is a good choice for voice modulation, it is not necessarily 
a good choice for data. The Gaussian pulse shape does not satisfy the Nyquist criterion, and therefore the pulse 
shape introduces ISI, which increases as a increases. Thus, improving spectral efficiency by increasing a leads to 
a higher ISI level, thereby creating an irreducible error floor from this self-interference. Since the required BER 
for voice is relatively high Pf, ~ 10 the ISI can be fairly high and still maintain this target BER. In fact, it is 
generally used as a rule of thumb that B g T s = .5 is a tolerable amount of ISI for voice transmission with GMSK. 
However, a much lower BER is required for data, which will put more stringent constraints on the maximum a 
and corresponding minimum B g , thereby decreasing the spectral efficiency of GMSK for data transmission. ISI 
mitigation techniques such as equalization can be used to reduce the ISI in this case so that a tolerable BER is 
possible without significantly compromising spectral efficiency. 
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5.6 Symbol Synchronization and Carrier Phase Recovery 



One of the most challenging tasks of a digital demodulator is to acquire accurate symbol timing and carrier phase 
information. Timing information, obtained via synchronization, is needed to delineate the received signal asso- 
ciated with a given symbol. In particular, timing information is used to drive the sampling devices associated 
with the demodulators for amplitude, phase, and frequency demodulation shown in Figures 5.11 and 5.23. Carrier 
phase information is needed in all coherent demodulators for both amplitude/phase and frequency modulation, as 
discussed in Sections 5.3 and 5.4 above. 

This section gives a brief overview of standard techniques for synchronization and carrier phase recovery 
in AWGN channels. In this context the estimation of symbol timing and carrier phase falls under the broader 
category of signal parameter estimation in noise. Estimation theory provides the theoretical framework to study this 
problem and to develop the maximum likelihood estimator of the carrier phase and symbol timing. However, most 
wireless channels suffer from time- varying multipath in addition to AWGN. Synchronization and carrier phase 
recovery is particularly challenging in such channels since multipath and time variations can make it extremely 
difficult to estimate signal parameters prior to demodulation. Moreover, there is little theory addressing good 
methods for parameter estimation of carrier phase and symbol timing when corrupted by time- varying multipath in 
addition to noise. In most performance analysis of wireless communication systems it is assumed that the receiver 
synchronizes to the multipath component with delay equal to the average delay spread 3 , and then the channel 
is treated as AWGN for recovery of timing information and carrier phase. In practice, however, the receiver 
will sychronize to either the strongest multipath component or the first multipath component that exceeds a given 
power threshold. The other multipath components will then compromise the receiver’s ability to acquire timing and 
carrier phase, especially in wideband systems like UWB. Multicarrier and spread spectrum systems have addition 
considerations related to synchronization and carrier recovery which will be discussed in Chapters 12 and 13, 
respectively. 

The importance of synchronization and carrier phase estimation cannot be overstated: without it wireless 
systems could not function. Moreover, as data rates increase and channels become more complex by adding 
additional degrees of freedom (e.g. multiple antennas), the task of receiver synchronizaton and phase recovery 
becomes even more complex and challenging. Techniques for synchronization and carrier recovery have been 
developed and analyzed extensively for many years, and these techniques continually evolve to meet the challenges 
associated with higher data rates, new system requirements, and more challenging channel characteristics. We give 
only a brief introduction to synchronizaton and carrier phase recovery techniques in this section. Comprehensive 
coverage of this topic as well as performance analysis of these techniques can be found in [19, 20], and more 
condensed treatments can be found in [7, Chapter 6], [21], 

5.6.1 Receiver Structure with Phase and Timing Recovery 

The carrier phase and timing recovery circuitry for the amplitude and phase demodulator is shown in Figure 5.28. 
For BPSK only the in-phase branch of this demodulator is needed. For the coherent frequency demodulator of 
Figure 5.23 a carrier phase recovery circuit is needed for each of the distinct M carriers, and the resulting circuit 
complexity motivates the need for the noncoherent demodulators described in Section 5.4.3. We see in Figure 5.28 
that the carrier phase and timing recovery circuits operate directly on the received signal prior to demodulation. 

Assuming an AWGN channel, the received signal r(t) is a delayed version of the transmitted signal s(t) plus 
AW GN n(t) : r(t) = s(t — r) + n(t ) , where r is the random propagation delay. Using the complex baseband form 
we have s(t) = di[x(t)e^°e^ 2n ^] and thus 

r(t) = 3ft | (x(t — r)eP^ + e j27r ^ c *| , (5.72) 

3 That is why delay spread is typically characterized by its rms value about its mean, as discussed in more detail in Chapter 2. 
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Quadrature branch 



Figure 5.28: Receiver Structure with Carrier and Timing Recovery. 



where 4> = (f> o — 27r/ c r results from the transmit carrier phase and the propagation delay. Estimation of r is needed 
for symbol timing, and estimation of p is needed for carrier phase recovery. Let us express these two unknown 
parameters as a vector 9 = (</>, r). Then we can express the received signal in terms of 9 as 

r(t) = s(t; 9) + n(t). (5.73) 



Parameter estimation must take place over some finite time interval To >T S . We call To the observation interval. 
In practice, however, parameter estimation is done initially over this interval and thereafter estimation is performed 
continually by updating the initial estimatre using tracking loops. Our development below focuses just on the 
initial parameter estimation over To : discussion of parameter tracking can be found in [19, 20]. 

There are two common estimation methods for signal parameters in noise, the maximum-likelhood criterion 
(ML), discussed in Section 5.1.4 in the context of receiver design, and the maximum a posteriori (MAP) criterion. 
The ML criterion choses the estimate 9 that maximizes p(r(t)\9) over the observation interval To, whereas the 
MAP criterion assumes some probability distribution on 9, p(9), and choses the estimate 9 that maximizes 



p{9\r{t)) 



P(r(t)\9)p(0) 

p( r (t)) 



over To- We assume that there is no prior knowledge of 9, so that p(9) becomes uniform and therefore the MAP 
and ML criteria are equivalent. 

To characterize the distribution p(r(t)\9), 0 < t < To, let us expand r(t) over the observation interval along 
a set of orthonormal basis functions {(bkit) } as 



K 

r{t) = ^r fc 0 fc (f), 0 <t < T 0 . 

fc=i 



Since n(t) is white with zero mean and power spectral density No/2, the pdf of the vector r 
conditioned on the unknown parameter 9 is given by 



p(r\9) 



\J irNoa 



K 



exp 



K 

E 

L k= 1 



(n - sfc(#)) s 

N 0 



(r 1, • • -,r K ) 

(5.74) 
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where by the basis expansion 



and we define 



We can show that 



r k = / r(t)(p k (t)dt, 

J Tq 



Sk{0) = / s(t;9)(f>k(t)dt. 
Jt 0 



i< 



Y^i r k ~ s k(0)] 2 = / [r(t) - s(t; 

JTq 



"dt. 



k= 1 



Using this in (5.74) yields that maximizing p(r\0) is equivalent to maximizing the likelihood function 



A (6) = exp 



-kS T } r{t) - smv 



dt. 



(5.75) 



(5.76) 



Maximization of the likelihood function (5.76) results in the joint ML estimate of the carrier phase and symbol 
timing. ML estimation of the carrier phase and symbol timing can also be done separately. In subsequent sections 
we will discuss the separate estimation of carrier phase and symbol timing in more detail. Techniques for joint 
estimation arc more complex: details of such techniques can be found in [19, Chapters 8-9], [7, Chapter 6.4]. 

5.6.2 Maximum Likelihood Phase Estimation 

In this section we derive the maximum likelihood phase estimate assuming the timing is known. The likelihood 
function (5.76) with timing known reduces to 



A ((p) = exp 

= exp 



~w f [*•(*) - »(*; 0 )] 2rf * 

iV o Jt 0 

-irr f x 2 (t)dt+^-[ r(t)s(t](p)dt - ^-s 2 (t](p)dt. 
JTo JTq 



(5.77) 



We estimate the carrier phase as the value (j) that maximizes this function. Note that the first term in (5.77) is 
independent of <i>. Moreover, we assume that the third integral, which measures the energy in s(t; 0) over the 
observation interval, is relatively constant in cj). With these observations we see that the 4> that maximizes (5.77) 
also maximizes 

A! ((f>) = / r(t)s(t; (j))dt. (5.78) 

JTo 

We can solve directly for the maximizing <f> in the simple case where the received signal is just an unmodulated 
carrier plus noise: r(t) = Acos(27r/ c f + </>) + n(t). Then 0 must maximize 



A.' ((f) = / r(t) cos(2tt f c t + (p) dt. 

JTq 



(5.79) 



Differentiating A'((p) relative to 4> and setting it to zero yields that (p satisfies 



'Tq 



r(t) sin(27r/ c f + (p)dt = 0. 



(5.80) 
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Solving (5.80) for <f> yields 



4 > = 



— tan 



-l 



f To r(t ) sin(2ir f c t)dt 
f To r(t) cos(2it f c t)dt 



(5.81) 



While we can build a circuit to compute (5.81) from the received signal r(t), in practice carrier phase recovery 
is done using a phase lock loop to satisfy (5.80), as shown in Figure 5.28. In this figure the integrator input in the 
absence of noise is given by e(t) = r(t) sin(27r f c t + 4>), and the integrator output is 



z{t) 




r(t) sin(27r/ c f + cj))dt, 



which is precisely the left hand side of (5.80). Thus, if z(t) = 0 then the estimate cp is the maximum-likelihood 
estimate for 4>. If z(t) / 0 then the VCO adjusts its phase estimate <j) up or down depending on the polarity of z(t): 
for z(t) > 0 it decreases cj> to reduce z(t), and for z(t) < 0 it increases (P to increase z[t). In practice the integrator 
in Figure 5.28 is replaced with a loop filter whose output ,5Asin(0 — (p) « .5 A(tj) — cp) is a function of the low- 
frequency component of its input e(t) = Acos(2irf c t+(j))sm(2Trf c t + (p) = .5Asin(</> — </>) + . 5Asin(27r/ c t+ (/>+ 
<p). The above discussion of the PLL operation assumes that <f> « (p since otherwise the polarity of z(t) may not 
indicate the correct phase adjustment, i.e. we would not necessarily have sin(</> — ©) ~ L — L. The PLL typically 
exhibits some transient behavior in its initial estimation of the carrier phase. The advantage of a PLL is that it 
continually adjusts its estimate <p to maintain z(t) = 0, which corrects for slow phase variations due to oscillator 
drift at the transmitter or changes in the propagation delay. In fact the PLL is an example of a feedback control 
loop. More details on the PLL and its performance can be found in [7, 19]. 
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Figure 5.29: Phase Lock Loop for Carrier Phase Recovery (Unmodulated Carrier) 

The PLL derivation is for an unmodulated carrier, yet amplitude and phase modulation embed the message 
bits into the amplitude and phase of the carrier. For such signals there arc two common carrier phase recovery 
approaches to deal with the effect of the data sequence on the received signal: the data sequence is either assumed 
known or it is treated as random such that the phase estimate is averaged over the data statistics. The first scenario 
is refered to as decision-directed parameter estimation, and this scenario typically results from sending a known 
training sequence. The second scenario is refered to as non decision-directed parameter estimation. With this 
technique the likelihood function (5.77) is maximized by averaging over the statistics of the data. One decision- 
directed technique uses data decisions to remove the modulation of the received signal: the resulting unmodulated 
carrier is then passed through a PLL. This basic structure is called a decision-feedback PLL since data decisions 
are fed back into the PLL for processing. The structure of a non decision-directed carrier phase recovery loop 
depends on the underlying distribution of the data. For large constellations most distributions lead to highly 
nonlinear functions of the parameter to be estimated. In this case the symbol distribution is often assumed to 
be Gaussian along each signal dimension, which greatly simplifies the recovery loop structure. An alternate non 
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decision-directed structure takes the Mth power of the signal (M = 2 for PAM and M for MPSK modulation), 
passes it through a bandpass filter at frequency M f c , and then uses a PLL. The nonlinear operation removes the 
effect of the amplitude or phase modulation so that the PLL can operate on an unmodulated carrier at frequency 
M f c . Many other structures for both decision-directed and non decision-directed carrier recovery can be used, with 
different tradeoffs in performance and complexity. A more comprehensive discussion of design and performance 
of carrier phase recovery be found in [19], [7, Chapter 6.2.4-6.2.5]. 



5.6.3 Maximum Likelihood Timing Estimation 

In this section we derive the maximum likelihood estimate of delay r assuming the carrier phase is known. Since 
we assume that the phase 0 is known, the timing recovery will not affect downconversion by the carrier shown 
in Figure 5.28. Thus, it suffices to consider timing estimation for the in-phase or quadrature baseband equivalent 
signals of r(t) and s(t; r). We denote the in-phase and quadrature components for r(t) as r j(t) and ro(t) and for 
s(t ; r) as s/(t; r) and sq(t; t). We focus on the in-phase branch as the timing recovered from this branch can be 
used for the quadrature branch. The baseband equivalent in-phase signal is given by 

■si(t; r) = y j s/(£;)g(t - kT s - t) (5.82) 

k 



where g(t) is the pulse shape and sj(h) denotes the amplitude associated with the in-phase component of the 
message transmitted over the A th symbol period. The in-phase baseband equivalent received signal is r j it) = 
sj(t: t) + ni(t). As in the case of phase synchronization, there are two categories of timing estimators: those for 
which the information symbols output from the demodulator arc assumed known (decision-directed estimators), 
and those for which this sequence is not assumed known (non decision-directed estimators). 

The likelihood function (5.76) with known phase 0 has a si mi lar form as (5.77), the case of known delay: 



A (r) = exp 
= exp 



~Nq ' t ri ® ~ s i( t 'i T )f dt 



> T 0 



-TT / r I (t)s I (t; r)dt — — sf (f; r)dt. 



N 0 



'T 0 



N 0 



'T 0 



No 



(5.83) 



Since the first and third terms in (5.83) do not change significantly with r, the delay estimate f that maximizes 
(5.83) also maximizes 



A'(t) = / r I (t)s I (t;T)dt = ys I (k) r(t)g(t - kT s - r)dt = y si(k)z k (T), 



lT 0 



where 



To 



Zk(r) = / r(t)g(t - kT s - r)dt. 
7r 0 



Differentiating (5.84) relative to r and setting it to zero yields that the timing estimate f must satisfy 



(5.84) 



(5.85) 



(5.86) 



For decision-directed estimation, (5.86) gives rise to the estimator shown in Figure 5.29. The input to the 
voltage-controlled clock (VCC) is (5.86). If this input is zero, then the timing estimate f = r. If not the clock (i.e. 
the timing estimate f) is adjusted to drive the VCC input to zero. This timing estimation loop is also an example 
of a feedback control loop. 
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Sj(k) 




Figure 5.30: Decision-Directed Timing Estimation 



One structure for non decision-directed timing estimation is the early-late gate synchronizer shown in Figure 
5.30. This structure exploits two properties of the autocorrelation of g(t), R g {r) = /’ ) Z ' S g{t)g(t — r)dt, namely its 
symmetry {R g (r) = R g (— r)) and that fact that its maximum value is at r = 0. The input to the sampler in the 
upper branch of Figure 5.30 is proportional to the autocorrelation R g (r—T—S) = f ( p g{t—T)g(t — f+d )df and the 
input to the sampler in the lower branch is proportional to the autocorrelation R g (r—T — 5) = f^ a g(t—T)g(t—f+ 
5)dt. If f = r then, since R g (5) = R g (—5), the input to the loop filter will be zero and the voltage controlled clock 
(VCC) will maintain its correct timing estimate. If f > r then R g (f — t + 6) > R g (f — t — S), and this negative 
input to the VCC will cause it to decrease its estimate of f . Conversely, if f < r then R g (f—T+5) > R g (f—T—5), 
and this positive input to the VCC will cause it to increase its estimate of f . 



r(t) 




Figure 5.31: Early-Fate Gate Synch ionizer 

More details on these and other structures for decision-directed and non decision-directed timing estimation 
as well as their performance tradeoffs can be found in [19], [7, Chapter 6.2.4-6.2.5]. 
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Chapter 5 Problems 

1. Show using properties of orthonormal basis functions that if Si(t) and $j(t) have constellation points s * and 
s j, respectively, then 

r T 

1 1 St Sj 1 1 2 — / (Si(t) - Sj(t)) 2 dt. 

Jo 

2. Find an alternate set of orthonormal basis functions for the space spanned by cos(27rf/T) and sin(27rf/T). 

3. Consider a set of M orthogonal signal waveforms s m (t), 1 < m < M, 0 < t < T, all of which have the 
same energy £. Define a new set of M waveforms as 

1 M 

= s m (t) - l<m< M, 0 < t < T 

i= 1 

Show that the M signal waveforms {£ rn {t ) } have equal energy, given by 

S' = (M - 1)8/ M 

What is the inner product between any two waveforms. 

4. Consider the three signal waveforms {(p\(t). fa (t), fait ) } shown below 



1/2 



11(1) 



13(1) 



1/2 







> 



(a) Show that these waveforms arc orthonormal. 

(b) Express the waveform x(t) as a linear combination of {<pi(t)} and find the coefficients, where x(t) is 
given as 

( -1 (0 < t < 1) 

x(t) = < 1 (1 < t < 3) 

{ 3 (3 < t < 4) 

5. Consider the four signal waveforms as shown in the figure below 

(a) Determine the dimensionality of the waveforms and a set of basis functions. 

(b) Use the basis functions to represent the four waveforms by vectors. 

(c) Determine the minimum distance between all the vector pairs. 
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A S 4(t) 



3 4 




6. Derive a mathematical expression for decision regions Z % that minimize error probability assuming that 
messages are not equally likely, i.e. p(rrii ) = Pi,i = 1, . . . , M, where pi is not necessarily equal to 1/M. 
Solve for these regions in the case of QPSK modulation with s 4 = (A c , 0), S 2 = (0, A c ), s 3 = (— A c , 0) and 
s 4 = (0, A c ), with p(.si) = p(s 3 ) = .2 and p(s 4 ) = p(s 3 ) = .3 

7. Show that the remainder noise term n r (tk ) is independent of the correlator outputs x, for all i, i.e. show that 
E[n r (tk)xi\ = 0. V i. Thus, since Xj conditioned on s t and n r (t ) are Gaussian and uncorrelated, they are 
independent. 



8. Show that if a given input signal is passed through a filter matched to that signal, the output SNR is maxi- 
mized. 

9. Find the matched filters g(T — t),0 <t <T and plot J Q T g(t)g(T — t)dt for the following waveforms: 

(a) Rectangular pulse: g(t ) = \J ^ 

(b) Sine pulse: g(t) = sinc(t). 

(c) Gaussian pulse: g(t) = x^e- 71 ' 2 * 2 /" 2 

10. Show that the ML receiver of Figure 5.4 is equivalent to the matched filter receiver of Figure 5.7 



11. Compute the three bounds (5.40), (5.43), (5.44), and the approximation (5.45) for an asymmetric signal 
constellation sq = (A c , 0), S 2 = (0, 2A C ), s 3 = (—2 A c , 0) and s 4 = (0, — A c ), assuming that A c /\fl% = 4 

12. Find the input to each branch of the decision device in Figure 5.11 if the transmit carrier phase </>o differs 
from the receiver carrier phase 0 by A </. 

13. Consider a 4-PSK constellation with d min = y/2. What is the additional energy required to send one extra bit 
(8-PSK) while keeping the same minimum distance (and consequently the same bit error probability)? 

14. Show that the average power of a square signal constellation with l bits per dimension. Si, is proportional to 
4 ; /3 and that the average power for one more bit per dimension ,S '/+ 1 ~ 46). Find Si for I = 2 and compute 
the average energy of MPSK and MPAM constellations with the same number of bits per symbol. 

15. For MPSK with differential modulation, let A0 denote the phase drift of the channel over a symbol time T s . 
In the absence of noise, how large must Aq be to make a detection error? 
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16. Find the Gray encoding of bit sequences to phase transitions in differential 8PSK. Then find the sequence 
of symbols transmitted using differential 8PSK modulation with this Gray encoding for the bit sequence 
101110100101110 starting at the A’th symbol time, assuming the transmitted symbol at the ( k — l)th symbol 
time is s (k — 1) = Ae J ' 7r / 4 . 

17. Consider the octal signal point constellation in the figure shown below 




8-PSK 



A 




(a) The nearest neighbor signal points in the 8-QAM signal constellation are separated in distance by A. 
Determine the radii a and b of the inner and outer circles. 

(b) The adjacent signal points in the 8-PSK arc separated by a distance of A. Determine the radius r of the 
circle. 

(c) Determine the average transmitter powers for the two signal constellations and compare the two pow- 
ers. What is the relative power advantage of one constellation over the other? (Assume that all signal 
points are equally probable.) 

(d) Is it possible to assign three data bits to each point of the signal constellation such that nearest (adjacent) 
points differ in only one bit position? 

(e) Determine the symbol rate if the desired bit rate is 90 Mbps. 

18. The 7T/-QPSK modulation may be considered as two QPSK systems offset by 7 t/ 4 radians. 

(a) Sketch the signal space diagram for a 7T/4-QPSK signal. 

(b) Using Gray encoding, label the signal points with the corresponding data bits. 

(c) Determine the sequence of symbols transmitted via 7T/4-QPSK for the bit sequence 0100100111100101 . 

(d) Repeat part (c) for 7T/4-DQPSK, assuming the last symbol transmitted on the in-phase branch had a 
phase of it and the last symbol transmitted on the quadrature branch had a phase of — 37r/4. 

19. Show that the minimum frequency separation for FSK such that the cos(2ir fjt) and cos(27r fit) arc orthog- 
onal is A / = mirijj |/j - f z \ = .5 /T s 

20. Show that the Nyquist criterion for zero ISI pulses given by (5.68) is equivalent to the frequency domain 
condition (5.68). 

21. Show that the Gaussian pulse shape does not satisfy the Nyquist criterion. 
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Chapter 6 

Performance of Digital Modulation over 
Wireless Channels 



We now consider the performance of the digital modulation techniques discussed in the previous chapter when used 
over AWGN channels and channels with flat-fading. There arc two performance criteria of interest: the probability 
of error, defined relative to either symbol or bit errors, and the outage probability, defined as the probability that the 
instantaneous signal-to-noise ratio falls below a given threshold. Flat-fading can cause a dramatic increase in either 
the average bit-error-rate or the signal outage probability. Wireless channels may also exhibit frequency selective 
fading and Doppler shift. Frequency-selective fading gives rise to intersymbol interference (ISI), which causes an 
irreducible error floor in the received signal. Doppler causes spectral broadening, which leads to adjacent channel 
interference (typically small at reasonable user velocities), and also to an irreducible error floor in signals with 
differential phase encoding (e.g. DPSK), since the phase reference of the previous symbol partially decorrelates 
over a symbol time. This chapter describes the impact on digital modulation performance of noise, flat-fading, 
frequency-selective fading, and Doppler. 

6.1 AWGN Channels 

In this section we define the signal-to-noise power ratio (SNR) and its relation to energy-per-bit (E^) and energy - 
per-symbol (E s ). We then examine the error probability on AWGN channels for different modulation techniques 
as parameterized by these energy metrics. Our analysis uses the signal space concepts of Chapter 5.1. 

6.1.1 Signal-to-Noise Power Ratio and Bit/Symbol Energy 

In an AWGN channel the modulated signal s(t) = } has noise n(t) added to it prior to reception. 

The noise n(t ) is a white Gaussian random process with mean zero and power spectral density Nq/2. The received 
signal is thus r(t ) = s(t ) + n{t). 

We define the received signal-to-noise power ratio (SNR) as the ratio of the received signal power P r to the 
power of the noise within the bandwidth of the transmitted signal sit). The received power P r is determined by 
the transmitted power and the path loss, shadowing, and multipath fading, as described in Chapters 2-3. The noise 
power is determined by the bandwidth of the transmitted signal and the spectral properties of n(t). Specifically, if 
the bandwidth of the complex envelope u{t) of s{t) is B then the bandwidth of the transmitted signal s(t) is 2 B. 
Since the noise n{t) has uniform power spectral density Nq/2, the total noise power within the bandwidth 2 B is 
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N = Nq/2 x 2 B = NqB. So the received SNR is given by 



SNR = 



Pr 

N 0 B ' 



In systems with interference, we often use the received signal-to-interference-plus-noise power ratio (SINR) in 
place of SNR for calculating error probability. If the interference statistics approximate those of Gaussian noise 
then this is a reasonable approximation. The received SINR is given by 



SINR = 



Pr 



where Pj is the average power of the interference. 

The SNR is often expressed in terms of the signal energy per bit Pi, or per symbol E s as 



SNR = 



Pr 

N 0 B 



E s 

NqBT s 



E b 

N(jBT h ’ 



( 6 . 1 ) 



where T s is the symbol time and T b is the bit time (for binary modulation T s = T b and E s = E b ). For data 
pulses with T s = 1/B, e.g. raised cosine pulses with (3 = 1, we have SNR = E s /Nq for multilevel signaling 
and SNR = E b /No for binary signaling. For general pulses, T s = k/B for some constant k, in which case 
k • SNR = E s /N 0 . 

The quantities 7 ., = E s /Nq and 7 ;, = E b /No are sometimes called the SNR per symbol and the SNR per 
bit, respectively. For performance specification, we arc interested in the bit error probability P b as a function of 
76 . However, for M-aray signaling (e.g. MPAM and MPSK), the bit error probability depends on both the symbol 
error probability and the mapping of bits to symbols. Thus, we typically compute the symbol error probability P s 
as a function of 7 ., based on the signal space concepts of Chapter 5.1 and then obtain P b as a function of 7 ^ using an 
exact or approximate conversion. The approximate conversion typically assumes that the symbol energy is divided 
equally among all bits, and that Gray encoding is used so that at reasonable SNRs, one symbol error corresponds 
to exactly one bit error. These assumptions for M-aray signaling lead to the approximations 



and 



7 b 



7 a 

log 2 M 



P b 



Ps 

log 2 M ' 



(6.2) 



(6.3) 



6.1.2 Error Probability for BPSK and QPSK 

We first consider BPSK modulation with coherent detection and perfect recovery of the carrier frequency and 
phase. With binary modulation each symbol corresponds to one bit, so the symbol and bit error rates are the same. 
The transmitted signal is si(t) = Ag(t) cos(27r/ c f) to sent a 0 bit and S 2 (t) = —Ag(t) cos(27r/ c f) to send a 1 bit. 
From (5.46) we have that the probability of error is 

( 6 . 4 ) 

From Chapter 5 , d mm = ||si — soil = \\A— (— ,4)[[ = 2,4. Let us now relate A to the energy -per-bit. We have 

fTb rTb rTb 

E b = I s 2 (t)dt = / s\(t)dt= / A 2 g 2 (t) cos 2 (2 -k f c t)dt = A 2 (6.5) 

Jo Jo ~ Jo 
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from (5.56). Thus, the signal constellation for BPSK in terms of energy-per-bit is given by so = \fE~b and 
si = —s/Ef). This yields the minimum distance d mLn = 2^4. = 2\J~E~b- Substituting this into (6.4) yields 



QPSK modulation consists of BPSK modulation on both the in-phase and quadrature components of the 
signal. With perfect phase and carrier recovery, the received signal components corresponding to each of these 
branches are orthogonal. Therefore, the bit error probability on each branch is the same as for BPSK: Pf, = 
Q( 1/275). The symbol error probability equals the probability that either branch has a bit error: 

Ps = 1 - [1 - Q(VWb )} 2 (6.7) 



Since the symbol energy is split between the in-phase and quadrature branches, we have 7., 
this into (6.7) yields P s is terms of 7., as 



Ps = 1 - [1 - Q(VTs)} 2 - 

From Section 5.1.5, the union bound (5.40) on P s for QPSK is 

Ps < 2Q{A/^fN 0 ) + Q(V2A/y/N 0 ). 
Writing this in terms of 7., = 27 b = A 2 /No yields 

Ps < 2Q(^tJ) + Q( x /2'y s ) < eQ(^E). 



The closed form bound (5.44) becomes 




~ —,5A 2 ~ 
N 0 




ex P[ 7^/2] - 



27;,. Substituting 



(6.8) 



(6.9) 



(6.10) 



( 6 . 11 ) 



Using the fact that the minimum distance between constellation points is d m i n = v2A 2 , we get the nearest 
neighbor approximation 



A 2 

P s ~ 2 Q | — | = 2 Q 



7s 



( 6 . 12 ) 



Note that with Gray encoding, we can approximate Pf, from P s by Pb ~ P s / 2, since we have 2 bits per symbol. 



Example 6.1: 

Find the bit error probability Pb and symbol error probability P s of QPSK assuming 7^ = 7 dB. Compare the exact 
Pb with the approximation Pb = P s /2 based on the assumption of Gray coding. Finally, compute P s based on the 
nearest-neighbor bound using j s = 2y/ t , and compare with the exact P s . 

Solution: We have jb = 10 7 / 10 = 5.012, so 

Pb = QiVWb) = QW 10-024) = 7.726 * 10" 4 . 

The exact symbol error probability P s is 

P s = 1 - [1 - Q( V / 27fe)] 2 = 1 - [1 - Q(V 10.02)] 2 = 1.545 * 10~ 3 . 
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The bit-error-probability approximation assuming Gray coding yields Pi, ~ P s / 2 = 7.723 * 10 4 , which is quite 
close to the exact P s . The nearest neighbor approximation to P s yields 

P s » 2Q( v /ts) = 2Q(Vl0.024) = 1.545 x 10“ 3 , 

which matches well with the exact P s . 



6.1.3 Error Probability for MPSK 

The signal constellation for MPSK has s t \ = Acos[^(p^] and Si 2 = Asin[ 27r ^ 1 ' > ] for i = 1, . . . , M. The 
symbol energy is E s = A 2 , so % = A 2 /No. From (5.57), for the received vector x = re j0 represented in polar 
coordinates, an error occurs if the ith signal constellation point is transmitted and 9 0 (27 r(i — 1 — .5) /M, 2ir(i — 
1 + .5) /M). The joint distribution of r and 9 can be obtained through a bivariate transformation of the noise n i 
and ri 2 on the in-phase and quadrature branches [4, Chapter 5.4], which yields 



p{r, 9) = 



ttN 0 



exp 



— — ( r 2 — 2 yf 2 E s r cos 9 + 2 E s 
N o V 



(6.13) 



Since the error probability depends only on the distribution of 9, we can integrate out the dependence on r, yielding 



p{9) = / p(r,9)dr = —e 
Jo tt 



_ ^ - 27 s sin 2 (0) 



zex p 



- 



cos 



dz. 



(6.14) 



By symmetry, the probability of error is the same for each constellation point. Thus, we can obtain P s from the 
probability of error assuming the constellation point si = {A, 0) is transmitted, which is 



Ps = 1 - 



nn/M 
> —n/M 



p(9)d9 = 1 — 



t/m ^ 



^ -2 7 sSin 2 (0) 



' —7 r/M 



7 r 



zexp 



- (z- 



cos 



dz. 



(6.15) 



A closed-form solution to this integral does not exist for M > 4, and hence the exact value of P s must be computed 
numerically. 

Each point in the MPSK constellation has two nearest neighbors at distance d m i n = 2Asin(n/M). Thus, the 
nearest neighbor approximation (5.45) to P s is given by 

P s « 2Q{y/2A/ v / Aq x sin(7r/M)) = 2Q(yj2r/~ s sin(7r/M)). (6.16) 



As shown in the prior example for QPSK, this nearest neighbor approximation can differ from the exact value of 
P s by more than an order of magnitude. However, it is much simpler to compute than the numerical integration 
of (6.15) that is required to obtain the exact P s . A tighter approximation for P s can be obtained by approximating 
p(9) as 

p{9) ~ y/Pf/TT cos(#)e~ 7sSin2 ^. (6.17) 

Using this approximation in the left hand side of (6.15) yields 

P s « 2Q ^y / 27^sin(7r/M)^ . (6.18) 



Example 6.2: 
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Compare the probability of bit error for 8PSK and 16PSK assuming 7 & = 15 dB and using the P s approximation 
given in (6.18) along with the approximations (6.3) and (6.2). 

Solution: From (6.2)we have that for 8PSK, 7 S = (log 2 8) • 10 15/10 = 94.87. Substituting this into (6.18) yields 

P s « 2 Q (v/l89.74sin(7r/8)) = 1.355 • 10" 7 . 

and using (6.3) we get Pf, = P s / 3 = 4.52 • 10~ 8 . For 16PSK we have 7^ = (log 2 16) • IO 15 / 10 = 126.49. 
Substituting this into (6.18) yields 

P s » 2 Q ^252.98 sin(7r/16)) = 1.916 • 10“ 3 , 

and using (6.3) we get Pj, = P s /4 = 4.79 • 10” 4 . Note that Pi, is much larger for 16PSK than for 8PSK for the 
same 7 b. This result is expected, since 16PSK packs more bits per symbol into a given constellation, so for a fixed 
energy-per-bit the minimum distance between constellation points will be smaller. 



The error probability derivation for MPSK assumes that the carrier phase is perfectly known at the receiver. 
Under phase estimation error, the distribution of p(0) used to obtain P s must incorporate the distribution of the 
phase rotation associated with carrier phase offset. This distribution is typically a function of the carrier phase 
estimation technique and the SNR. The impact of phase estimation error on coherent modulation is studied in [1, 
Appendix C] [2, Chapter 4.3.2] [9, 10]. These works indicate that, as expected, significant phase offset leads to 
an irreducible bit error probability. Moreover, nonbinary signalling is more sensitive than BPSK to phase offset 
due to the resulting cross-coupling between the in-phase and quadrature signal components. The impact of phase 
estimation error can be especially severe in fast fading, where the channel phase changes rapidly due to constructive 
and destructive multipath interference. Even with differential modulation, phase changes over and between symbol 
times can produce irreducible errors [11]. Timing errors can also degrade performance: analysis of timing errors 
in MPSK performance can be found in [2, Chapter 4.3.3][12]. 



6.1.4 Error Probability for MPAM and MQAM 



The constellation for MPAM is A t = (2i — 1 — M)d, i = 1,2,..., M. Each of the M — 2 inner constellation 
points of this constellation have two nearest neighbors at distance 2d. The probability of making an error when 
sending one of these inner constellation points is just the probability that the noise exceeds d in either direction: 
P s (si) = p(|n| > d), i = 2, . . . , M — 1. For the outer constellation points there is only one nearest neighbor, so 
an error occurs if the noise exceeds d in one direction only: P s {si) = p( n > d) = .5/>( | n| > d),i = 1, M. The 
probability of error is thus 



M 



Ps ~ m ^ 



i=\ 



M v l V N 0 M v V No I M v 




From (5.54) the average energy per symbol for MPAM is 

M _ M 



E ‘ = jt = li - 1 - "N 2 = ;(" 2 - l P- 

i=l i = 1 



(6.19) 



(6.20) 
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Thus we can write P s in terms of the average energy E s as 



p.- W-V q 

M ^ 



67 S 



M 2 - 1 



(6.21) 



Consider now MQAM modulation with a square signal constellation of size M = L 2 . This system can be 
viewed as two MPAM systems with signal constellations of size L transmitted over the in-phase and quadrature 
signal components, each with half the energy of the original MQAM system. The constellation points in the in- 
phase and quadrature branches take values A % = (2i — 1 — L)d, i = 1,2 , ,L. The symbol error probability 
for each branch of the MQAM system is thus given by (6.21) with M replaced by L = \[M and 7,, equal to the 
average energy per symbol in the MQAM constellation: 







( 6 . 22 ) 



Note that j s is multiplied by a factor of 3 in (6.22) instead of the factor of 6 in (6.21) since the MQAM constellation 
splits its total average energy 7,, between its in-phase and quadrature branches. The probability of symbol error for 
the MQAM system is then 



P s = 1- 




Vm 



Q 




2 



(6.23) 



The nearest neighbor approximation to probability of symbol error depends on whether the constellation point is 
an inner or outer point. If we average the nearest neighbor approximation over all inner and outer points, we obtain 
the MPAM probability of error associated with each branch: 



Ps 



s/M 



Q 




(6.24) 



For nonrectangular constellations, it is relatively straightforward to show that the probability of symbol error 
is upper bounded as 



P 8 < 1- 



1-2 Q 





(6.25) 



The nearest neighbor approximation for nonrectangular constellations is 



<6 - 26) 

where is the largest number of nearest neighbors for any constellation point in the constellation and d rnin 

is the minimum distance in the constellation. 



Example 6.3: 

For 16QAM with 7& = 15 dB (7.5 = log 2 M ■ 7 Q, compare the exact probability of symbol error (6.23) with the 
nearest neighbor approximation (6.24), and with the symbol error probability for 16PSK with the same 7^ that was 
obtained in the previous example. 
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Solution: The average symbol energy "y s = 4 ■ 10 1 ' 5 = 126.49. The exact P s is then given by 



Pa = 1 - 







3 • 126.49 
15 



2 

= 7.37- 10“ 7 . 



The nearest neighbor approximation is given by 



Pa 



2(4-1) 




126.49 

"T5 



= 3.68-10 



-7 



which differs by roughly a factor of 2 from the exact value. The symbol error probability for 16PSK in the previous 
example is P s ~ 1.916 • 10 -3 , which is roughly four orders of magnitude larger than the exact P s for 16QAM. 
The larger P s for MPSK versus MQAM with the same M and same 7 /, is due to the fact that MQAM uses both 
amplitude and phase to encode data, whereas MPSK uses just the phase. Thus, for the same energy per symbol or 
bit, MQAM makes more efficient use of energy and thus has better performance. 



The MQAM demodulator requires both amplitude and phase estimates of the channel so that the decision 
regions used in detection to estimate the transmitted bit are not skewed in amplitude or phase. The analysis of 
the performance degradation due to phase estimation error is similar to the case of MPSK discussed above. The 
channel amplitude is used to scale the decision regions to correspond to the transmitted symbol: this scaling is 
called Automatic Gain Control (AGC). If the channel gain is estimated in error then the AGC improperly scales 
the received signal, which can lead to incorrect demodulation even in the absence of noise. The channel gain is 
typically obtained using pilot symbols to estimate the channel gain at the receiver. However, pilot symbols do not 
lead to perfect channel estimates, and the estimation error can lead to bit errors. More details on the impact of 
amplitude and phase estimation errors on the performance of MQAM modulation can be found in [15, Chapter 
10.3][16], 

6.1.5 Error Probability for FSK and CPFSK 

Let us first consider the error probability of traditional binary FSK with the coherent demodulator of Figure 5.24. 
Since demodulation is coherent, we can neglect any phase offset in the carrier signals. The transmitted signal is 
defined by 

Si(t) = A\/2Tb cos(2ir fit), i = 1,2. (6.27) 

So = A 2 and jt, = A 2 /No. The input to the decision device is 

z = xi — X 2 . (6.28) 



The device outputs a 1 bit if z > 0 and a 0 bit if z < 0. Let us assume that si(t) is transmitted, then 

z|l = A + n\ — ri 2 - (6.29) 

An error occurs if z = A + n\ — ri 2 < 0. On the other hand, if * 2 ( 7 ) is transmitted, then 



z|0 = m — A — ri 2 , 



(6.30) 



165 




and an error occurs if z = m — A — ri 2 > 0. For m and 779 independent white Gaussian random variables with 
mean zero and variance Nq/2, their difference is a white Gaussian random variable with mean zero and variance 
equal to the sum of variances Nq/2 + Nq/2 = Nq. Then for equally likely bit transmissions, 

P b = .5 p(A + m - n 2 < 0) + .5p(m - A - n 2 > 0) = Q(A/\/Nq ) = Q{^/3b). (6.31) 



The derivation of P s for coherent AT-FSK with M > 2 is more complex and does not lead to a closed-form solution 
[Equation 4.92] [2] . The probability of symbol error for noncoherent AT-FSK is derived in [10, Chapter 8.1] as 



M 



ft = £(-') 



m+1 



m = 1 



M - 1 
m 



1 



m + 1 



exp 



—m'jg 

m + 1 



(6.32) 



The error probability of CPFSK depends on whether the detector is coherent or noncoherent, and also whether 
it uses symbol-by-symbol detection or sequence estimation. Analysis of error probability for CPFSK is complex 
since the memory in the modulation requires error probability analysis over multiple symbols. The formulas 
for error probability can also become quite complex. Detailed derivations of error probability for these different 
CPFSK structures can be found in [1, Chapter 5.3]. As with linear modulations, FSK performance degrades under 
frequency and timing errors. A detailed analysis of the impact of such errors on FSK performance can be found in 
[2, Chapter 5. 2] [13, 14]. 



6.1.6 Error Probability Approximation for Coherent Modulations 

Many of the approximations or exact values for P s derived above for coherent modulation arc in the following 
form: 

Ps(l's) ~ CUM Q (\/ 3 Ml s) , (6.33) 

where olm and / 3m depend on the type of approximation and the modulation type. In particular, the nearest 
neighbor approximation has this form, where «m is the number of nearest neighbors to a constellation at the 
minimum distance, and /3m is a constant that relates minimum distance to average symbol energy. In Table 6.1 we 
summarize the specific values of olm and 3m for common P s expressions for PSK, QAM, and FSK modulations 
based on the derivations in the prior sections. 

Performance specifications are generally more concerned with the bit error probability P b as a function of 
the bit energy 7&. To convert from P s to P b and from 7., to 77, we use the approximations (6.3) and (6.2), which 
assume Gray encoding and high SNR. Using these approximations in (6.33) yields a simple formula for P b as a 
function of 7*,: 



Pbhb) = olm Q ( v &) > ( 6 - 34 ) 

where olm = olm/ log 2 AT and 3m = (log 2 M)3m for clm and 3m in (6.33). This conversion is used below to 
obtain P b versus 75 from the general form of P s versus 7 s in (6.33). 

6.1.7 Error Probability for Differential Modulation 

The probability of error for differential modulation is based on the phase difference associated with the phase 
comparator input of Figure 5.20. Specifically, the phase comparator extracts the phase of 

z(k)z*(k - 1) = 1)) + Ae mk)+M n*{k - 1) + Ae~ jWk ~ 1)+M n{k) + n{k)n*{k - 1) (6.35) 
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Modulation 


Pst Is) 


Pbhb) 


BFSK: 




Pb = Q{s/Tb) 


BPSK: 




Pb = Q ( v/27 1) 


QPSK,4QAM: 




Pb~ Q { 


MPAM: 


p ~ 2 (M— l) p ( / 67, \ 
^ ~ M V 1 V M^I J 


p : ^ 2(M-1 ) n ( / 67 fc log 2 M \ 
n > ~ M log 2 \ Y (M 2 - 1) J 


MPSK: 


P s ss 2Q ( v /27Jsin(7r/M)) 


p b ~ logf m QW 2 lb log 2 M sm(7 r/M) ) 


Rectangular MQAM: 


p ~ 4(/M-l) p l / 37., ) 
• s ~ VM \ M—l J 


p _ 4(VM-1) n ( /37 6 k>g 2 M\ 

n ~ Vm log 2 M v vv i M ~- 1 ) J 


Nonrectangular MQAM: 


Ps~±q(\! M-l'j 


p, ~ 4 C<(l 3 ^ 6 log2 M 'l 

r b ~ log 2 Mbt yy (M—l) J 



Table 6.1: Approximate Symbol and Bit Error Probabilities for Coherent Modulations 

to determine the transmitted symbol. Due to symmetry, we can assume a given phase difference to compute the 
error probability. Assuming a phase difference of zero. Oik) — 9(k — 1) = 0, yields 

z(k)z*(k - 1 ) = A 2 + Ae jim+M n*{k - 1) + Ae~ mk - 1)+M n{k) + n{k)n*{k - 1). (6.36) 

Next we define new random variables h{k) = n{k)e~^ e ^ l ^ + ^ and h{k — 1) = n(k — which 

have the same statistics as n(k ) and nik — 1). Then we have 

z(k)z*(k — 1) = A 2 + A(h*(k — 1) + h(k)) + h(k)h*(k — 1). (6.37) 

There are three terms in (6.37): the first term with the desired phase difference of zero, and the second and third 
terms, which contribute noise. At reasonable SNRs the third noise term is much smaller than the second, so we 
neglect it. Dividing the remaining terms by A yields 

z = A + $t{h*(k — 1) + n(fc)} + j^{n*(k — 1) + n{k)}. (6.38) 

Let us define x = and y = 9{5}. The phase of 5 is thus given by 

0z = tan -1 (6.39) 

x 

Given that the phase difference was zero, and error occurs if |6Q| > ir/M. Determining p(\0z\ > n/M) is 
identical to the case of coherent PSK, except that from (6.38) we see that we have two noise terms instead of one, 
and therefore the noise power is twice that of the coherent case. This will lead to a performance of differential 
modulation that is roughly 3 dB worse than that of coherent modulation. 

In DPSK modulation we need only consider the in-phase branch of Figure 5.20 to make a decision, so we set 
x = 4i{2} in our analysis. In particular - , assuming a zero is transmitted, if x = A + 3?{n*(/c — 1) + h(k)} < 0 
then a decision error is made. This probability can be obtained by finding the characteristic or moment-generating 
function for x, taking the inverse Laplace transform to get the distribution of x, and then integrating over the deci- 
sion region x < 0. This technique is very general and can be applied to a wide variety of different modulation and 
detection types in both AWGN and fading [10, Chapter 1.1]: we will use it later to compute the average probabil- 
ity of symbol error for linear modulations in fading both with and without diversity. In DPSK the characteristic 
function for x is obtained using the general quadratic form of complex Gaussian random variables [1, Appendix 
B] [1 8, Appendix B], and the resulting bit error probability is given by 

Pb = (6.40) 
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For DQPSK the characteristic function for 5 is obtained in [1, Appendix C], which yields the bit error probability 
Pb ~ ^ x exp ^ — 2 + - - ^ 1 0 (ax)dx - ^exp ^ Io(a&), (6.41) 

where a & .765^/% and b « 1.85^/Tb. 



6.2 Alternate Q Function Representation 



In (6.33) we saw that P s for many coherent modulation techniques in AWGN is approximated in terms of the 
Gaussian Q function. Recall that Q(z) is defined as the probability that a Gaussian random variable x with mean 
zero and variance one exceeds the value i.e. 



Q(z) = p(x > z) 




—j=e~ x2 / 2 dx 



(6.42) 



The Q function is not that easy to work with since the argument z is in the lower limit of the integrand, the integrand 
has infinite range, and the exponential function in the integral doesn’t lead to a closed form solution. 

In 1991 an alternate representation of the Q function was obtained by Craig [5]. The alternate form is given 



by 



Q(z) 



n r/2 


\ ~ z2 1 


/ exp 


o • 2 jt 


Jo 


2 sin <j)_ 



d(j) z > 0 . 



(6.43) 



This representation can also be deduced from the work of Weinstein [ 6 ] or Pawula et al. [7]. Note that in this 
alternate form, the integrand is over a finite range that is independent of the function argument z, and the integral 
is Gaussian with respect to z. These features will prove important in using the alternate representation to derive 
average error probability in fading. 

Craig’s motivation for deriving the alternate representation was to simplify the probability of error calculation 
for AWGN channels. In particular, we can write the probability of bit error for BPSK using the alternate form as 



Pb = Q(VWb) = 



j-ir/2 


' -7 b ' 


/ exp 


• 2 j. 


Jo 


_sm <p 



d(j). 



(6.44) 



Similarly, the alternate representation can be used to obtain a simple exact formula for P s of MPSK in AWGN as 
[5] 





Qpsk'Ys 


- / exp 




7T Jo 


sin 0 



d<(>, 



(6.45) 



where g ps k = sin 2 (n/M). Note that this formula does not correspond to the general form o.mQWPm'Js), since 
the general form is an approximation while (6.45) is exact. Note also that (6.45) is obtained via a finite range 
integral of simple trigonometric functions that is easily computed via a numerical computer package or calculator. 



6.3 Fading 

In AWGN the probability of symbol error depends on the received SNR or, equivalently, on 7 .,. In a fading 
environment the received signal power varies randomly over distance or time due to shadowing and/or multipath 
fading. Thus, in fading j s is a random variables with distribution p 7s (7), and therefore 7 5 s (7,5 ) is also random. 
The performance metric when 7.5 is random depends on the rate of change of the fading. There arc three different 
performance criteria that can be used to characterize the random variable P s : 
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• The outage probability, Pout, defined as the probability that 7 S falls below a given value corresponding to 
the maximum allowable P s . 

• The average error probability, P s , averaged over the distribution of 7.5, 

• Combined average error probability and outage, defined as the average error probability that can be achieved 
some percentage of time or some percentage of spatial locations. 

The average probability of symbol error applies when the signal fading is on the order of a symbol time 
(T s ~ T c ), so that the signal fade level is constant over roughly one symbol time. Since many error correction 
coding techniques can recover from a few bit errors, and end-to-end performance is typically not seriously degraded 
by a few simultaneous bit errors, the average error probability is a reasonably good figure of merit for the channel 
quality under these conditions. 

However, if the signal power is changing slowly (T s << T c ), then a deep fade will affect many simultaneous 
symbols. Thus, fading may lead to large error bursts, which cannot be corrected for with coding of reasonable 
complexity. Therefore, these error bursts can seriously degrade end-to-end performance. In this case acceptable 
performance cannot be guaranteed over all time or, equivalently, throughout a cell, without drastically increasing 
transmit power. Under these circumstances, an outage probability is specified so that the channel is deemed 
unusable for some fraction of time or space. Outage and average error probability are often combined when 
the channel is modeled as a combination of fast and slow fading, e.g. log-normal shadowing with fast Rayleigh 
fading. 

Note that when T c « T s , the fading will be averaged out by the matched filter in the demodulator. Thus, for 
very fast fading, performance is the same as in AWGN. 



6.3.1 Outage Probability 

The outage probability relative to 70 is defined as 

no 

Pout = p{ Is < 7o) = / P'Ysi'lWn, (6-46) 

Jo 

where 70 typically specifies the minimum SNR required for acceptable performance. For example, if we consider 
digitized voice, Pi, = 10 -3 is an acceptable error rate since it generally can’t be detected by the human ear. Thus, 
for a BPSK signal in Rayleigh fading, 75 < 7 dB would be declared an outage, so we set 70 = 7 dB. 

In Rayleigh fading the outage probability becomes 

Pout= r ie^*d7. = l- e^. (6.47) 

Jo Is 

Inverting this formula shows that for a given outage probability, the required average SNR 7 iS is 



7o 

- ln(l - Pout) ’ 



(6.48) 



In dB this means that 101og7 s must exceed the target 10 log 70 by F,i = — 101og[— ln(l — P ou t)] to maintain 
acceptable performance more than 100 * (1 — P ou t ) percent of the time. The quantity P,j is typically called the dB 

fade margin. 



Example 6.4: Determine the required for BPSK modulation in slow Rayleigh fading such that 95% of the time 
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(or in space), Pb{lb) < 10 4 . 



Solution: For BPSK modulation in AWGN the target BER is obtained at 8.5 dB, i.e. for Pb(lb) 
Pb(10' 85 ) = 10 -4 . Thus, 70 = 8.5 dB. Since we want P out = p(lb < 7 o) = -05 we have 



_ _ 7o 
76 “ — ln(l — P ou t) 



10-S5 

— In ( 1 — .05) 



21.4 dB. 



Q(VWb), 

(6.49) 



6.3.2 Average Probability of Error 

The average probability of error is used as a performance metric when T s ~ T c . Thus, we can assume that 7 s is 
roughly constant over a symbol time. Then the averaged probability of error is computed by integrating the error 
probability in AWGN over the fading distribution: 

/*oo 

Ps = / Psi^Pish)^, (6-50) 

Jo 

where P s ( 7 ) is the probability of symbol error in AWGN with SNR 7 , which can be approximated by the expres- 
sions in Table 6.1. For a given distribution of the fading amplitude r (i.e. Rayleigh, Rician, log-normal, etc.), we 
compute p ls ( 7 ) by making the change of variable 

P- f s(l)d'y =p{r)dr. (6.51) 



p{r ) = -^e r2 / 2o ' 2 ) r > 0, 



For example, in Rayleigh fading the received signal amplitude r has the Rayleigh distribution 

(6.52) 

and the signal power is exponentially distributed with mean 2er 2 . The SNR per symbol for a given amplitude r is 

r 2 T, 



( 7 * 



7 = 



2al ’ 



(6.53) 



where <r 2 = A7/2 is the PSD of the noise in the in-phase and quadrature branches. Differentiating both sides of 
this expression yields 

vT 

d'y = — tt dr. (6.54) 

Substituting (6.53) and (6.54) into (6.52) and then (6.51) yields 

t 2 



Since the average SNR per symbol 7 S is just a 2 T s /a 2 , we can rewrite (6.55) as 

1 



= —p-'ths 

7 

which is exponential. For binary signaling this reduces to 



P'rsi 7) = =-e _ 

IS 



P lb {.l) = 7/7b , 

lb 



(6.55) 



(6.56) 



(6.57) 
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Integrating ( 6 . 6 ) over the distribution (6.57) yields the following average probability of error for BPSK in 
Rayleigh fading. 



BPSK: 



Pb 



1 

2 




1 

47 b 



(6.58) 



where the approximation holds for large 7 ,. A similar integration of (6.31) over (6.57) yields the average proba- 
bility of error for binary FSK in Rayleigh fading as 



Binary FSK: 



Pb = 



1 

2 



1 



7 b 

2 + 7 1 



1 

47ft’ 



(6.59) 



Thus, the performance of BPSK and binary FSK converge at high SNRs. For noncoherent modulation, if we 
assume the channel phase is relatively constant over a symbol time, then we obtain the probability of error by 
again integrating the error probability in AWGN over the fading distribution. For DPSK this yields 



DPSK: 



2(1 + 7b) ~ 2y fe ’ 



(6.60) 



where again the approximation holds for large 7 ,. Note that in the limit of large 7 ,, there is an approximate 3 dB 
power penalty in using DPSK instead of BPSK. This was also observed in AWGN, and is the power penalty of 
differential detection. In practice the power penalty is somewhat smaller, since DPSK can correct for slow phase 
changes introduced in the channel or receiver, which are not taken into account in these error calculations. 

If we use the general approximation P s ~ otuQW Pm 7s) then the average probability of symbol error in 
Rayleigh fading can be approximated as 



P.S 




umQ{\/ Pmi) ■ 



e 7 // 7 s d 7 s 

Is 



®-m 



SpMls 

1 + .5/?m7 



‘ZPhlls ’ 



(6.61) 



where the last approximation is in the limit of high SNR. 

It is interesting to compare bit error probability of the different modulation schemes in AWGN and fading. 
For binary PSK, FSK, and DPSK, the bit error probability in AWGN decreases exponentially with increasing 7 ,. 
However, in fading the bit error probability for all the modulation types decreases just linearly with increasing 7 ,. 
Similar behavior occurs for nonbinary modulation. Thus, the power necessary to maintain a given /+, particularly 
for small values, is much higher in fading channels than in AWGN channels. For example, in Figure 6.1 we plot 
the error probability of BPSK in AWGN and in flat Rayleigh fading. We see that it requires approximately 8 dB 
SNR to maintain a ICG 3 bit error rate in AWGN while it requires approximately 24 dB SNR to maintain the same 
error rate in fading. A similar plot for the error probabilities of MQAM, based on the approximations (6.24) and 
(6.61), is shown in Figure 6.2. From these figures it is clear that to maintain low power requires some technique 
to remove the effects of fading. We will discuss some of these techniques, including diversity combining, spread 
spectrum, and RAKE receivers, in later chapters. 

Rayleigh fading is one of the worst-case fading scenarios. In Figure 6.3 we show the average bit error proba- 
bility of BPSK in Nakagami fading for different values of the Nak again i-m parameter. We see that as m increases, 
the fading decreases, and the average bit error probability converges to that of an AWGN channel. 



6.3.3 Moment Generating Function Approach to Average Error Probability 

The moment generating function (MGF) for a nonnegative random variable 7 with pdf 77 ( 7 ), 7 > 0, is defined 
as 

|*00 

(6.62) 



r 00 

M 7 (s) = / p 7 (j)e S7 dj. 
Jo 
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Figure 6.1: Average P/, for BPSK in Rayleigh Fading and AWGN. 



Note that this function is just the Laplace transform of the pdf 77 ( 7 ) with the argument reversed in sign: C\p 7 ( 7 )] = 
A4 7 (—s). Thus, the MGF for most fading distributions of interest can be computed either in closed-form using 
classical Laplace transforms or through numerical integration. In particular, the MGF for common multipath fading 
distributions arc as follows [10, Chapter 5.1]. 



Rayleigh: 

B (s) = (1 - s^r 1 . 



(6.63) 



Ricean with factor K: 



. , / \ 1 + K \ Ks'jg 

M ls (s) = — — — exp — — — 

1 + A — s'fg [ 1 + A — s'fg 



(6.64) 



Nakagami-m: 



M 7s (s) 




As indicated by its name, the moments A[y n ] of 7 can be obtained from A4 7 (s) as 



E[ 7 n ] 



d n 

ds n 



[- M 7 s ( S )]ls=0 



(6.65) 



( 6 . 66 ) 



The MGF is a very useful tool in performance analysis of modulation in fading both with and without diversity. 
In this section we discuss how it can be used to simplify performance analysis of average probability of symbol 
error in fading. In the next chapter we will see that it also greatly simplifies analysis in fading channels with 
diversity. 

The basic premise of the MGF approach for computing average error probability in fading is to express the 
probability of error P s in AWGN for the modulation of interest either as an exponential function of 7 .,, 



P s = ci exp[— C 27 s ] 



(6.67) 
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Figure 6.2: Average Pb for MQAM in Rayleigh Fading and AWGN. 




Figure 6.3: Average Pb for BPSK in Nakagami Fading. 



for constants ci and C 2 , or as a finite range integral of such an exponential function: 

f B 

P s = ciexp[— C2(x)'y\dx, ( 6 . 68 ) 

J A 

where the constant 02(2) may depend on the integrand but the SNR 7 does not and is not in the limits of integration 
either. These forms allow the average probability of error to be expressed in terms of the MGF for the fading 
distribution. Specifically, if P s = a expf— /Ty,], then 

POO 

Ps= Ciexp[-C27]p 7s (7)d7 = ciAf 7s (-c 2 ). (6.69) 

Jo 



Since DPSK is in this form with c\ 
any type of fading is 



1/2 and C 2 = 1, we see that the average probability of bit error for DPSK in 



Pb=\M ls {~ 1 ), 



(6.70) 
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where A4 7s (s) is the MGF of the fading distribution. For example, using A4 7a (s) for Rayleigh fading given by 
(6.63) with s = — 1 yields Pj, = [2(1 + 7f,)]~ 1 , which is the same as we obtained in (6.60). If P s is in the integral 
form of (6.68) then 



rB 



Ps = 



Cl exp[-c 2 (x)'y\dxp 7a ('y)d'y = ci 



r B 



exp[-c 2 (x)7]p 7s (7)d7 



f 

dx = ci / 

Ja 



M 7s (—C 2 (x))dx. 



(6.71) 

In this latter case, the average probability of symbol error is a single finite -range integral of the MGF of the fading 
distribution, which can typically be found in closed form or easily evaluated numerically. 

Let us now apply the MGF approach to specific modulations and fading distributions. In (6.33) we gave a 
general expression for P s of coherent modulation in AW GN in terms of the Gaussian Q function. We now make a 
slight change of notation in (6.33) setting a = and g = .5 /3 m to get 



P s { Is) = aQ(y/2gj s ), 



(6.72) 



where a and g are constants that depend on the modulation. The notation change is to obtain the error probability 
as an exact MGF, as we now show. 

Using the alternate Q function representation (6.43), we get that 
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(6.73) 



which is in the desired form (6.68). Thus, the average error probability in fading for modulations with P s = 
aQ(y/2g^f s ) in AWGN is given by 
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(6.74) 



where M. 7a (s) is the MGF associated with the pdf p 7a (7) as defined by (6.62). Recall that Table 6. 1 approximates 
the error probability in AWGN for many modulations of interest as P s ~ aQ(y/2gj s ), and thus (6.74) gives 
an approximation for the average error probability of these modulations in fading. Moreover, the exact average 
probability of symbol error for coherent MPSK can be obtained in a form similar to (6.74) by noting that Craig’s 
formula for P s of MPSK in AW GN given by (6.45) is in the desired form (6.68). Thus, the exact average probability 
of error for MPSK becomes 
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(6.75) 



where g = sin 2 (7r/M) depends on the size of the MPSK constellation. The MGF A4 7s (s) for Rayleigh, Rician, 
and Nakaganii-m distributions were given, respectively, by (6.63), (6.64), and (6.65) above. Substituting s = 
—g/ sin 2 (/> in these expressions yields 
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Ricean with factor K : 
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Nakagami-m: 
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All of these functions are simple trigonometries and arc therefore easy to integrate over the finite range in 
(6.74) or (6.75). 



Example 6.5: Use the MGF technique to find an expression for the average probability of error for BPSK modu- 
lation in Nakagami fading. 



Solution: We use the fact that for an AWGN channel BPSK has Pi, = Q( \/2z h ). so a = 1 and g = 1 in (6.72). 
The moment generating function for Nakagami-m fading is given by (6.78), and substituting this into (6.74) with 
a = g = 1 yields 
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dtj). 



From (6.23) we see that the exact probability of symbol error for MQAM in AWGN contains both the Q func- 
tion and its square. Fortunately, an alternate form of Q 2 (z) derived in [8] allows us to apply the same techniques 
used above for MPSK to MQAM modulation. Specifically, an alternate representation of Q 2 (z) is derived in [8] as 
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(6.79) 



Note that this is identical to the alternate representation for Q(z) given in (6.43) except that the upper limit of the 
integral is 7r/4 instead of tt/2. Thus we can write (6.23) in terms of the alternate representations for Q(z) and 
Q 2 (z) as 



Ps(ls) 




(6.80) 



where g = 1.5 /(M — 1) is a function of the size of the MQAM constellation. Then the average probability of 
symbol error in fading becomes 
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Thus, the exact average probability of symbol error is obtained via two finite -range integrals of the MGF of the 
fading distribution, which can typically be found in closed form or easily evaluated numerically. 
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The MGF approach can also be applied to noncoherent and differential modulations. For example, consider 
noncoherent M-FSK, with P s in AWGN given by (6.32), which is a finite sum of the desired form (6.67). Thus, 
in fading, the average symbol error probability of noncoherent M-FSK is given by 
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(6.82) 



Finally, for differential MPSK, it can be shown [11] that the average probability of symbol error is given by 



p = r /2 exp[— 7 S (1 - y/'l - g psk cos 6)\ ^ g 
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for <j ps k = sin 2 (n/M), which is in the desired form (6.68). Thus we can express the average probability of symbol 
error in terms of the MGF of the fading distribution as 
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(6.84) 



A more extensive discussion of the MGF technique for finding average probability of symbol error for different 
modulations and fading distributions can be found in [10, Chapter 8.2], 



6.3.4 Combined Outage and Average Error Probability 

When the fading environment is a superposition of both fast and slow fading, i.e. log-normal shadowing and 
Rayleigh fading, a common performance metric is combined outage and average error probability, where outage 
occurs when the slow fading falls below some target value and the average performance in nonoutage is obtained 
by averaging over the fast fading. We use the following notation: 

• Let y s denote the average SNR per symbol for a fixed path loss with averaging over fast fading and shadow- 
ing. 

• Let y s denote the (random) SNR per symbol for a fixed path loss and random shadowing but averaged over 
fast fading. Its average value is y s . 

• Let y s denote the random SNR due to fixed path loss, shadowing, and multipath. 

With this notation we can specify an average error probability P s with some probability 1 — P ou t ■ An outage is 
declared when the received SNR per symbol due to shadowing and path loss alone, y s , falls below a given target 
value y SQ . When not in outage (y s > y ), the average probability of error is obtained by averaging over the 
distribution of the fast fading conditioned on the mean SNR: 

r oo 

Ps= Ps('ys)phs\ls)dls- (6.85) 
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The criterion used to determine the outage target j Su is typically based on a given maximum average probability 
of error, i.e. P s < P So - where the target j so must then satisfy 

/»oo 

p so = / ^(TsMTs |7 S0 )^7s- (6.86) 

Jo 

Clearly whenever 7,, > 7 SQ , the average error probability will be below the target value. 



Example 6.6: 

Consider BPSK modulation in a channel with both log-normal shadowing (a = 8 dB) and Rayleigh fading. 
The desired maximum average error probability is P bo = 10~ 4 , which requires 7 &0 = 34 dB. Determine the value 
of 7 & that will insure Pi, < HP 1 with probability 1 — P out = .95. 

Solution: We must find 75, the average of 7/, in both the fast and slow fading, such that p(pf, > 7t> 0 ) = 1 — P ou t- 
For log-normal shadowing we compute this as: 

p(Tb > 34) = „ ( 5 _! > = Q = ! _ p mu (6.87) 

since (75 — 75) /cr is a Gauss-distributed random variable with mean zero and standard deviation one. Thus, the 
value of 7b is obtained by substituting the values of P ou t and o in (6.87) and using a table of Q functions or an 
inversion program, which yields (34 — 7b)/8 = —1.6 or p h = 46.8 dB. 



6.4 Doppler Spread 



Doppler spread results in an irreducible error floor for modulation techniques using differential detection. This 
is due to the fact that in differential modulation the signal phase associated with one symbol is used as a phase 
reference for the next symbol. If the channel phase decorrelates over a symbol, then the phase reference becomes 
extremely noisy, leading to a high symbol error rate that is independent of received signal power. The phase 
correlation between symbols and therefore the degradation in performance are functions of the Doppler frequency 
fo = v/X and the symbol time T s . 

The first analysis of the irreducible error floor due to Doppler was done by Bello and Nelin in [17]. In that 
work analytical expressions for the irreducible error floor of noncoherent FSK and DPSK due to Doppler are 
determined for a Gaussian Doppler power spectrum. However, these expressions arc not in closed-form, so must 
be evaluated numerically. Closed-form expressions for the bit error probability of DPSK in fast Rician fading, 
where the channel decorrelates over a bit time, can be obtained using the MGF technique, with the MGF obtained 
based on the general quadratic form of complex Gaussian random variables [18, Appendix B] [1, Appendix B]. A 
different approach utilizing alternate forms of the Marcum Q function can also be used [10, Chapter 8.2.5]. The 
resulting average bit error probability for DPSK is 
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where pc is the channel correlation coefficient after a bit time T b , K is the fading parameter of the Rician distri- 
bution, and 7 b is the average SNR per bit. For Rayleigh fading ( K = 0) this simplifies to 
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Letting j b — > oc in (6.88) yields the irreducible error floor: 
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A similar approach is used in [20] to bound the bit error probability of DQPSK in fast Rician fading as 
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(6.91) 



where K is as before, pc is the signal correlation coefficient after a symbol time T s , and is the average SNR per 
symbol. Letting 7 S — > oo yields the irreducible error floor: 



DQPSK: 



P floor 



1 - 



(Pc/V 2) 2 
1 - (Pc/V 2) 2 



exp 



(2 — y/2){K/2) 

1 — Pc/V 2 



(6.92) 



As discussed in Chapter 3.2.1, the channel coiTelation Ac(t) over time t equals the inverse Fourier transform 
of the Doppler power spectrum Sc(f) as a function of Doppler frequency /. The correlation coefficient is thus 
pc = Ac(T) / Ac( 0) evaluated at T = T s for DQPSK or T = T b for DPSK. Table 6.2, from [21], gives the 
value of pc for several different Doppler power spectra models, where Be is the Doppler spread of the channel. 
Assuming the uniform scattering model (pc = Joi^irfoTb)) and Rayleigh fading (K = 0) in (6.90) yields an 
irreducible error for DPSK of 

iW = 1 ~ PMPP „ (6.93) 

where Be = fo = v/X is the maximum Doppler in the channel. Note that in this expression, the error floor 
decreases with data rate R = 1/T b . This is true in general for irreducible error floors of differential modulation 
due to Doppler, since the channel has less time to decorrelated between transmitted symbols. This phenomenon is 
one of the few instances in digital communications where performance improves as data rate increases. 



Type 


Doppler Power Spectrum Sc(f) 


Pc = A c (T)/A c (0) 


Rectangular 


37 i/i < b d 


sinc(2BoT) 


Gaussian 


A-c’‘ IUh 


g-GBoTT 


Uniform Scattering 


u \< Bd 


Jq(2ttBoT) 


1st Order Butterworth 


*0 &D \ 


e -2-KB D T 



Table 6.2: Correlation Coefficients for Different Doppler Power Spectra Models. 

A plot of (6.88), the error probability of DPSK in fast Rician fading, for uniform scattering (pc = Jq(2tt foTh)) 
and different values of foTb is shown in Figure 6.4. We see from this figure that the error floor starts to dominate 
at 7 fe = 15 dB in Rayleigh fading (K = 0), and as K increases the value of p b where the error floor dominates also 
increases. We also see that increasing the data rate R b = 1/T b by an order of magnitude decreases the error floor 
by roughly two orders of magnitude. 



Example 6.7: 

Assume a Rayleigh fading channel with uniform scattering and a maximum Doppler of fo = 80 Hertz. For what 
approximate range of data rates will the irreducible error floor of DPSK be below 10 4 . 
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Figure 6.4: Average P b for DPSK in Fast Rician Fading with Uniform Scattering. 



Solution: We have Pfi oor ~ < 10 4 . Solving for Ti, with fo = 80 Hz, we get 



T b < 



V2 ■ 10~ 4 

7 r • 80 



5.63 • 10“ 5 , 



which yields R > 17.77 Kbps. 



Deriving analytical expressions for the irreducible error floor becomes intractable with more complex mod- 
ulations, in which case simulations are often used. In particular, simulatons of the irreducible error floor for 7t/4 
DQPSK with square root raised cosine filtering have been conducted since this modulation is used in the IS-54 
TDMA standard [22, 23]. These simulation results indicate error floors between 10 :i and 10" 4 . As expected, in 
these simulations the error floor increases with vehicle speed, since at higher vehicle speeds the channel decorre- 
lates more over a symbol time. 

6.5 Intersymbol Interference 

Frequency-selective fading gives rise to ISI, where the received symbol over a given symbol period experiences 
interference from other symbols that have been delayed by multipath. Since increasing signal power also increases 
the power of the ISI, this interference gives rise to an irreducible error floor that is independent of signal power. 
The irreducible error floor is difficult to analyze, since it depends on the ISI characteristics and the modulation 
format, and the ISI characteristics depend on the characteristics of the channel and the sequence of transmitted 
symbols. 

The first extensive analysis of ISI degradation to symbol error probability was done by Bello and Nelin [24], 
In that work analytical expressions for the irreducible error floor of coherent FSK and noncoherent DPSK are 
determined assuming a Gaussian delay profile for the channel. To simplify the analysis, only ISI associated with 
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adjacent symbols was taken into account. Even with this simplification, the expressions arc very complex and 
must be approximated for evaluation. The irreducible error floor can also be evaluated analytically based on the 
worst-case sequence of transmitted symbols or it can be averaged over all possible symbol sequences [25, Chapter 
8.2], These expressions arc also complex to evaluate due to their dependence on the channel and symbol sequence 
characteristics. An approximation to symbol error probability with ISI can be obtained by treating the ISI as 
uncorrelated white Gaussian noise [28]. Then the SNR becomes 



7 * = 
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(6.94) 



where I is the power associated with the ISI. In a static channel the resulting probability of symbol error will be 
P s ( 7 S ) where P s is the probability of symbol error in AWGN. If both the transmitted signal and the ISI experience 
flat-fading, then 7 S will be a random variable with a distribution pips), and the average symbol error probability is 
then P s = f P s [p s )p{p s )(Pf s . Note that p s is the ratio of two random variables: the received power P r and the ISI 
power /, and thus the resulting distribution p(%) may be hard to obtain and is not in closed form. 

Irreducible error floors due to ISI are often obtained by simulation, which can easily incorporate different 
channel models, modulation formats, and symbol sequence characteristics [26, 28, 27, 22, 23]. The most extensive 
simulations for determining irreducible error floor due to ISI were done by Chuang in [26]. In this work BPSK, 
DPSK, QPSK, OQPSK and MSK modulations were simulated for different pulse shapes and for channels with 
different power delay profiles, including a Gaussian, exponential, equal-amplitude two-ray, and empirical power 
delay profile. The results of [26] indicate that the irreducible error floor is more sensitive to the rms delay spread of 
the channel than to the shape of its power delay profile. Moreover, pulse shaping can significantly impact the error 
floor: in the raised cosine pulses discussed in Chapter 5.5, increasing /3 from zero to one can reduce the error floor 
by over an order of magnitude. An example of Chuang’s simulation results is shown in Figure 6.5. This figure plots 
the irreducible bit error rate as a function of normalized rms delay spread d = (JT m /Ts for BPSK, QPSK, OQPSK, 
and MSK modulation assuming a static channel with a Gaussian power delay profile. We see from this figure that 
for all modulations, we can approximately bound the irreducible error floor as Pfi oor < d 2 for .02 < d < .1. Other 
simulation results support this bound as well [28]. This bound imposes severe constraints on data rate even when 
symbol error probabilities on the order of 10 ~ 2 arc acceptable. For example, the rms delay spread in a typical 
urban environment is approximately <jT m = 2. 5 //see. To keep ox m < .1 T s requires that the data rate not exceed 
40 Kbaud, which generally isn’t enough for high-speed data applications. In rural environments, where multipath 
is not attenuated to the same degree as in cities, <JT m ~ 25/xsec, which reduces the maximum data rate to 4 Kbaud. 



Example 6.8: 

Using the approximation Pfioor < {^Tm/Ts ) 2 , find the maximum data rate that can be transmitted through a chan- 
nel with delay spread or m = 3/x sec using either BPSK or QPSK modulation such that the probability of bit error 
Pb is less than 10~ 3 . 

Solution: For BPSK, we set Pfi oor = (cr^/T;,) 2 , so we require 7), > cjT m fsJ Pfioor = 94.87/xsecs, which leads to 
a data rate of R = 1/T& = 10.54 Kbps. For QPSK, the same calculation yields T s > ox m / y 7 Pfioor = 94.87/xsecs. 
Since there arc 2 bits per symbol, this leads to a data rate of R = 2/T s = 21.01 Kbps. This indicates that for a 
given data rate, QPSK is more robust to ISI than BPSK, due to that fact that its symbol time is slower. This result 
is also true using the more accurate error floors associated with Figure 6.5 rather than the bound in this example. 
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Figure 6.5: Irreducible error versus normalized rms delay spread d = ar m /T^ for Gaussian power delay profile 
(from [26] ©IEEE). 
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Chapter 6 Problems 

1. Consider a system in which data is transferred at a rate of 100 bits/sec over the channel. 

(a) Find the symbol duration if we use sine pulse for signalling and the channel bandwidth is 10 kHz. 

(b) If the received SNR is 10 dB. Find the SNR per symbol and the SNR per bit if 4-QAM is used. 

(c) Find the SNR per symbol and the SNR per bit for 16-QAM and compare with these metrics for 4-QAM. 

2. Consider BPSK modulation where the apriori probability of 0 and 1 is not the same. Specifically p[s n = 0] 
= 0.3 and p[s n = 1] = 0.7. 

(a) Find the probability of bit error P^ in AWGN assuming we encode a 1 as s\(t) = Acos(27r/ c f) and a 
0 as amplitude S 2 (f) = — A cos(2tt f c t) , and the receiver structure is as shown in Figure 5.17. 

(b) Suppose you can change the threshold value in the receiver of Figure 5.17. Find the threshold value 
that yields equal error probability regardless of which bit is transmitted, i.e. the threshold value that 
yields p(rh = 0| m = 1 )p(m = 1) = p(m = 1| m = 0 )p(m = 0). 

(c) Now suppose we change the modulation so that si(f) = Acos(27r/ c f) and 52 (f) = — B cos(2irf c t. 
Find A and B so that the receiver of Figure 5.17 with threshold at zero has p(rh = 0|m = 1 )p(m = 
1) = p{m = l|m = 0 )p(m = 0). 

(d) Compute and compare the expression for I), in parts (a), (b) and (c) assuming E^/Nq = 10 dB. For 
which system is pi, minimized? 

3. Consider a BPSK receiver where the demodulator has a phase offset of cj) relative to the transmitted signal, so 
for a transmitted signal s(t ) = +g(t) cos(27r/ c f ) , the carrier in the demodulator of Figure 5. 17 is cos(27r/ c f+ 
([>). Determine the threshold level in the threshold device of Figure 5.17 that minimizes probability of bit 
error, and find this minimum error probability. 

4. Assume a BPSK demodulator where the receiver noise is added after the integrator, as shown in the figure 
below. The decision device outputs a “1” if its input x has > 0, and a “0” otherwise. Suppose the tone 
jammer n(t) = 1.1 e J °, where p(9 = mr/3) = 1/6 for n = 0, 1, 2, 3, 4, 5. What is the probability of making 
a decision error in the decision device, i.e. outputting the wrong demodulated bit, assuming A c = \J 2 /7j, 
and that information bits corresponding to a “1” (s(t) = A c cos(27r/ c f)) or a “0” (s(f) = — A c cos(2n f c t)) 
are equally likely. 



n(t) 




cos (2 7Tf c t) 



5. Find an approximation to P s for the following signal constellations: 

6 . Plot the exact symbol error probability and the approximation from Table 6 . 1 of 16QAM with 0 < 7.5 < 30 
dB. Does the error in the approximation increase or decrease with 7 ., and why? 
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7. Plot the symbol error probability P s for QPSK using the approximation in Table 6. 1 and Craig’s exact result 
for 0 < 7 S < 30 dB. Does the error in the approximation increase or decrease with 7 S and why? 

8. In this problem we derive an algebraic proof of the alternate representation of the Q-function (6.43) from its 
original representation (6.42). We will work with the complementary error function (erfc) for simplicity and 
make the conversion at the end. The erfc(x') function is traditionally defined by 



2 f 

erfc (a;) = — = / 
V7T Jx 



-V 



dt. 



(6.95) 



The alternate representation is of this, corresonding to the alternate representation of the Q-function (6.43) 
is 



erfc (a;) = — 

KJo 



r 7r/2 



-aP/sin 2 ^ 



(a) Consider the integral 



J*(a)= / 
Jo 



oo „—at * 



/ 0 X 2 + t 2 

Show that I x (a) satisfies the following differential equation: 



dt. 



(6.96) 



(6.97) 



x 2 I x (a ) - 



dl x {a) 

da 



1 hr 



(6.98) 



(b) Solve the differential equation (6.98) and deduce that 



Ix(a) = 



poo ^—at 



/„ hr* dt = Y/’ ? ^ x ' ra) - 



(6.99) 
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Hint: I x (a) is a function in two variables x and a. However, since all our manipulations deal with a 
only, you can assume x to be a constant while solving the differential equation. 

(c) Setting a = 1 in (6.99) and making a suitable change of variables in the LHS of (6.99), derive the 
alternate representation of the erfc function : 

erfc(x) = - r /2 e- x2 /^°dO 

71 Jo 

(d) Convert this alternate representation of the erfc function to the alternate representation of the Q func- 
tion. 

9. Consider a communication system which uses BPSK signalling with average signal power of 100 Watts and 
the noise power at the receiver is 4 Watts. Can this system be used for transmission of data? Can it be used 
for voice? Now consider there is fading with an average SNR 77, = 20 dB. How does your answer to the 
above question change? 

10. Consider a cellular system at 900 MHz with a transmission rate of 64 Kbps and multipath fading. Explain 
which performance metric, average probability of error or outage probability, is more appropriate and why 
for user speeds of 1 mph, 10 mph, and 100 mph. 

1 1 . Derive the expression for the moment generating function for Rayleigh fading. 

12. This problem illustrates why satellite systems that must compensate for shadow fading are going bankrupt. 
Consider a LEO satellite system orbiting 500 Km above the earth. Assume the signal follows a free space 
path loss model with no multipath fading or shadowing. The transmitted signal has a carrier frequency of 900 
MHz and a bandwidth of 10 KHz. The handheld receivers have noise spectral density of 10 ~ 16 (total noise 
power is N 0 B ) mW/Hz. Assume nondirectional antennas (0 dB gain) at both the transmitter and receiver. 
Suppose the satellite must support users in a circular cell on the earth of radius 100 Km at a BER of 10 “ 6 . 

(a) For DPSK modulation what transmit power is needed such that all users in the cell meet the 10 ~ 6 BER 
target. 

(b) Repeat part (a) assuming that the channel also experiences log normal shadowing with a = 8 dB, and 
that users in a cell must have T), = 10~ 6 (for each bit) with probability 0.9. 

13. In this problem we explore the power penalty involved in going to higher level signal modulations, i.e. from 
BPSK to 16PSK. 



(a) Find the minimum distance between constellation points in 16PSK modulation as a function of signal 
energy E s . 

(b) Find ajf and (3m such that the symbol error probability of 16PSK in AWGN is approximately 



Ps 




(c) Using your expression in part (b), find an approximation for the average symbol error probability of 
16PSK in Rayleigh fading in terms of j s . 

(d) Convert the expressions for average symbol error probability of 16PSK in Rayleigh fading to expres- 
sions for average bit error probability assuming Gray coding. 

(e) Find the approximate value of 77 required to obtain a BER of Hr 3 in Rayleigh fading for BPSK and 
16PSK. What is the power penalty in going to the higher level signal constellation at this BER? 
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14. Find a closed-form expression for the average probability of error for DPSK modulation in Nakagami-m 
fading evalute for m = 4 and = 10 dB. 

15. The Nakagami distribution is parameterized by m, which ranges from m = .5 to m = oo. The m parameter 
measures the ratio of LOS signal power to multipath power, so m = 1 corresponds to Rayleigh fading, 
m = oo corresponds to an AWGN channel with no fading, and m = .5 corresponds to fading that results in 
performance that is worse than with a Rayleigh distribution. In this problem we explore the impact of the 
parameter m on the performance of BPSK modulation in Nakagami fading. 

Plot the average bit error Pi, of BPSK modulation in Nakagami fading with average SNR ranging from 0 to 
20dB for m parameters m = 1 (Rayleigh), m = 2, and m = 4 (The Moment Generating Function technique 
of Section 6.3.3 should be used to obtain the average error probability). At an average SNR of 10 dB, what 
is the difference in average BER? 

16. Assume a cellular system with log-normal shadowing plus Rayleigh fading. The signal modulation is DPSK. 
The service provider has determined that it can deal with an outage probability of .01, i.e. 1 in 100 customers 
arc unhappy at any given time. In nonoutage the voice BER requirement is P /, = 10 3 . Assume a noise 
power spectral density of N a = 10 -16 mW/Hz, a signal bandwidth of 30 KHz, a carrier frequency of 900 
MHz, free space path loss propagation with nondirectional antennas, and shadowing standard deviation of 
ct = 6 dB. Find the maximum cell size that can achieve this performance if the transmit power at the mobiles 
is limited to 100 mW. 

17. Consider a cellular system with circular cells with radius equal to 100 meters. Assume propagation follows 
the simplified path loss model with K = 1, do = 1 m, and 7 = 3. Assume the signal experiences log-normal 
shadowing on top of path loss with (J^ dB = 4 as well as Rayleigh fading. The transmit power at the base 
station is Pt = 100 mW, the system bandwidth is B = 30 KHz, and the noise PSD is Nq = 10~ 14 W/Hz. 
Assuming BPSK modulation, we want to find the cell coverage area (percentage of locations in the cell) 
where users have average P l less than 10 :i . 

(a) Find the received power due to path loss at the cell boundary. 

(b) Find the minimum average received power (due to path loss and shadowing) such that with Rayleigh 
fading about this average, a BPSK modulated signal with this average received power at a given cell 
location has Pj, < 10~ 4 . 

(c) Given the propagation model for this system (simplified path loss, shadowing, and Rayleigh fading), 
find the percentage of locations in the cell where under BPSK modulation, Pt, < 10 -4 . 

18. In this problem we derive the probability of bit error for DPSK in fast Rayleigh fading. By symmetry, the 
probability of error is the same for transmitting a zero bit or a one bit. Let us assume that over time kT 1 , a 
zero bit is transmitted, so the transmitted symbol at time kp is the same as at time k — 1: s(k) = s(k — 1). 
In fast fading the corresponding received symbols arc z(k — 1) = g k - 1 s(k — 1) + n(k — 1) and z (k) = 
g k s(k — 1) + n(k), where g k -\ and arc the fading channel gains associated with transmissions over times 
(k — 1 )p and kp. 

a) Show that the decision variable input to the phase comparator of Figure 5.20 to extract the phase differ- 
ence is z(k)z*(k - 1) = g k g* k _ r + g k s{k - l)n^_ 1 + g^sl^rik + n k n* k _ v 

Assuming a reasonable SNR, the last term n k n* k _ l of this expression can be neglected. Neglecting this term 
and defining h k = s* k _ 1 n k and h k ~ 1 = * k _ 1 r>k- 1 , we get a new random variable 2 = g k 9 k _ 1 + + 

g k _]h k . Given that a zero bit was transmitted over time kp, an error is made if x = kl{z} < 0, so we must 
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determine the distribution of x. The characteristic function for x is the 2-sided Laplace transform of the pdf 
of x : 

/ OO 

p x (s)e- sx dx = E[e~ sx }. 

-OO 



This function will have a left plane pole p\ and a right plane pole p 2 , so can be written as 

PlP2 



<f>x(s) = 



0 - pi)(s -p 2 )' 



The left plane pole pi corresponds to the pdf px(x) for x > 0 and the right plane pole corresponds to the 
pAipx{x) for x < 0 

b) Show through partial fraction expansion that ( I>x(s) can be written as 



$.y(s) 



P 1 P 2 1 P 1 P 2 1 

(Pi -P 2 ) (s -Pi) {P 2 -Pi) {s -P 2 Y 



An error is made if x = ] R{z} < 0, so we need only consider the pdf px(x) for x < 0 corresponding to the 
second term of $x(s) in part b). 



c) Show that the inverse Laplace transform of the second term of $ from part b) is 

/ \ PlP2 p2X , p, 
px(x) = e^ 2 , x < 0. 

P 2 ~ Pi 

d) Use part c) to show that P b = —pi/(p 2 ~ Pi)- 



In x = K{2} = K{ Qkdk-i + the channel gains and g^-i and noises ri} ; and hk-i 

are complex Gaussian random variables. Thus, the poles p\ and p 2 in px(x) arc derived using the general 
quadratic form of complex Gaussian random variables [1, Appendix B] [18, Appendix B] as 



-1 

2(7&[1 + Pc])] + No) ’ 



^ 2(7jl-p c ])] + iVo)’ 

for pc the correlation coefficient of the channel over the bit time Tj,. 

e) Find a general expression for Pi, in fast Rayleigh fading using these values of p 1 and p 2 in the P, expres- 

sion from part d). 

f) Show that this reduces to the average probability of error Pi, = for a slowly fading channel that 

does not decorrelate over a bit time. 



19. Plot the bit error probability for DPSK in fast Rayleigh fading for ranging from 0 to 60 dB and pc = 

B i)T) with BpT = .01, .001, and .0001. For each value of at approximately what value of x b 
does the error floor dominate the error probability/ 

20. Find the irreducible error floor for DQPSK modulation due to Doppler, assuming a Gaussian Doppler power 
spectrum with Bp = 80 Hz and Rician fading with K = 2. 
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21. Consider a wireless channel with an average delay spread of 100 nsec and a doppler spread of 80 Hz. Given 
the error floors due to doppler and ISI, for DQPSK modulation in Rayleigh fading and uniform scattering, 
approximately what range of data rates can be transmitted over this channel with a BER less than 10 ~ 4 . 

22. Using the error floors of Figure 6.5, find the maximum data rate that can be transmitted through a channel 
with delay spread nr m = 3/i sec using BPSK, QPSK, or MSK modulation such that the probability of bit 
error Pi, is less than 1 0 3 . 



189 




Chapter 7 

Diversity 



In Chapter 6 we saw that both Rayleigh fading and log normal shadowing induce a very large power penalty on the 
performance of modulation over wireless channels. One of the most powerful techniques to mitigate the effects of 
fading is to use diversity-combining of independently fading signal paths. Diversity-combining uses the fact that 
independent signal paths have a low probability of experiencing deep fades simultaneously. Thus, the idea behind 
diversity is to send the same data over independent fading paths. These independent paths arc combined in some 
way such that the fading of the resultant signal is reduced. For example, consider a system with two antennas 
at either the transmitter or receiver that experience independent fading. If the antennas arc spaced sufficiently 
far apart, it is unlikely that they both experience deep fades at the same time. By selecting the antenna with 
the strongest signal, called selection combining, we obtain a much better signal than if we just had one antenna. 
This chapter focuses on common techniques at the transmitter and receiver to achieve diversity. Other diversity 
techniques that have potential benefits over these schemes in terms of performance or complexity arc discussed in 
[1, Chapter 9.10]. 

Diversity techniques that mitigate the effect of multipath fading arc called microdiversity, and that is the focus 
of this chapter. Diversity to mitigate the effects of shadowing from buildings and objects is called macrodiversity. 
Macrodiversity is generally implemented by combining signals received by several base stations or access points. 
This requires coordination among the different base stations or access points. Such coordination is implemented 
as part of the networking protocols in infrastructure-based wireless networks. We will therefore defer discussion 
of macrodiversity until Chapter 15, where we discuss the design of such networks. 

7.1 Realization of Independent Fading Paths 

There arc many ways of achieving independent fading paths in a wireless system. One method is to use multiple 
transmit or receive antennas, also called an antenna array, where the elements of the array arc separated in distance. 
This type of diversity is referred to as space diversity. Note that with receiver space diversity, independent fading 
paths are realized without an increase in transmit signal power or bandwidth. Moreover, coherent combining 
of the diversity signals leads to an increase in SNR at the receiver over the SNR that would be obtained with 
just a single receive antenna, which we discuss in more detail below. Conversely, to obtain independent paths 
through transmitter space diversity, the transmit power must be divided among multiple antennas. Thus, with 
coherent combining of the transmit signals the received SNR is the same as if there were just a single trans mi t 
antenna. Space diversity also requires that the separation between antennas be such that the fading amplitudes 
corresponding to each antenna arc approximately independent. For example, from (3.26) in Chapter 3, in a uniform 
scattering environment with isotropic transmit and receive antennas the minimum antenna separation required for 
independent fading on each antenna is approximately one half wavelength ( .38A to be exact). If the transmit or 
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receive antennas arc directional (which is common at the base station if the system has cell sectorization), then 
the multipath is confined to a small angle relative to the LOS ray, which means that a larger antenna separation is 
required to get independent fading samples [2]. 

A second method of achieving diversity is by using either two transmit antennas or two receive antennas with 
different polarization (e.g. vertically and horizontally polarized waves). The two transmitted waves follow the 
same path. However, since the multiple random reflections distribute the power nearly equally relative to both 
polarizations, the average receive power corresponding to either polarized antenna is approximately the same. 
Since the scattering angle relative to each polarization is random, it is highly improbable that signals received 
on the two differently polarized antennas would be simultaneously in deep fades. There arc two disadvantages 
of polarization diversity. First, you can have at most two diversity branches, corresponding to the two types of 
polarization. The second disadvantage is that polarization diversity loses effectively half the power (3 dB) since 
the transmit or receive power is divided between the two differently polarized antennas. 

Directional antennas provide angle, or directional, diversity by restricting the receive antenna beamwidth to 
a given angle. In the extreme, if the angle is very small then at most one of the multipath rays will fall within the 
receive beamwidth, so there is no multipath fading from multiple rays. However, this diversity technique requires 
either a sufficient number of directional antennas to span all possible directions of arrival or a single antenna whose 
directivity can be steered to the angle of arrival of one of the multipath components (preferably the strongest one). 
Note also that with this technique the SNR may decrease due to the loss of multipath components that fall outside 
the receive antenna beamwidth, unless the directional gain of the antenna is sufficiently large to compensate for 
this lost power. Smart antennas arc antenna arrays with adjustable phase at each antenna element: such arrays 
form directional antennas that can be steered to the incoming angle of the strongest multipath component [3], 

Frequency diversity is achieved by transmitting the same narrowband signal at different carrier frequencies, 
where the carriers arc separated by the coherence bandwidth of the channel. This technique requires additional 
transmit power to send the signal over multiple frequency bands. Spread spectrum techniques, discussed in Chapter 
13, arc sometimes described as providing frequency diversity since the channel gain varies across the bandwidth 
of the transmitted signal. However, this is not equivalent to sending the same information signal over indepedently 
fading paths. As discussed in Chapter 13.2.4, spread spectrum with RAKE reception does provide independently 
fading paths of the information signal and thus is a form of frequency diversity. Time diversity is achieved by 
transmitting the same signal at different times, where the time difference is greater than the channel coherence 
time (the inverse of the channel Doppler spread). Time diversity does not require increased transmit power, but it 
does decrease the data rate since data is repeated in the diversity time slots rather than sending new data in these 
time slots. Time diversity can also be achieved through coding and interleaving, as will be discussed in Chapter 
8. Clearly time diversity can’t be used for stationary applications, since the channel coherence time is infinite and 
thus fading is highly correlated over time. 

In this chapter we will focus on space diversity as a reference to describe the diversity systems and the 
different combining techniques, although the combining techniques can be applied to any type of diversity. Thus, 
the combining techniques will be defined as operations on an antenna array. Receiver and transmitter diversity arc 
treated separately, since the system models and diversity combining techniques for each have important differences. 



7.2 Receiver Diversity 

7.2.1 System Model 

In receiver diversity the independent fading paths associated with multiple receive antennas arc combined to obtain 
a resultant signal that is then passed through a standard demodulator. The combining can be done in several ways 
which vary in complexity and overall performance. Most combining techniques arc linear: the output of the 
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combiner is just a weighted sum of the different fading paths or branches, as shown in Figure 7.1 for M -branch 
diversity. Specifically, when all but one of the complex a:,s are zero, only one path is passed to the combiner 
output. When more than one of the ctj’s is nonzero, the combiner adds together multiple paths, where each path 
may be weighted by different value. Combining more than one branch signal requires co-phasing, where the phase 
0j of the ith branch is removed through the multiplication by a; = for some real-valued a t . This phase 

removal requires coherent detection of each branch to determine its phase 0j. Without co-phasing, the branch 
signals would not add up coherently in the combiner, so the resulting output could still exhibit significant fading 
due to constructive and destructive addition of the signals in all the branches. 

The multiplication by a:,; can be performed either before detection (predetection) or after detection (post- 
detection) with essentially no difference in performance. Combining is typically performed post-detection, since 
the branch signal power and/or phase is required to determine the appropriate a t value. Post-detection combining 
of multiple branches requires a dedicated receiver for each branch to determine the branch phase, which increases 
the hardware complexity and power consumption, particularly for a large number of branches. 



r 1 e*^ 1 s(t) r 2 e i 02 s(t) r 3 e i 03 s(t) r M e i 0 M s(t) 




Figure 7.1: Linear Combiner. 

The main purpose of diversity is to coherently combine the independent fading paths so that the effects of 
fading are mitigated. The signal output from the combiner equals the original transmitted signal s(t) multiplied by 
a random complex amplitude term a i r %- This complex amplitude term results in a random SNR 7 s at the 

combiner output, where the distribution of 7 s is a function of the number of diversity paths, the fading distribution 
on each path, and the combining technique, as shown in more detail below. 

There arc two types of performance gain associated with receiver space diversity: array gain and diversity 
gain. The array gain results from coherent combining of multiple receive signals. Even in the absence of fading, 
this can lead to an increase in average received SNR. For example, suppose there is no fading so that r, = \[W S 
for E s the energy per symbol of the transmitted signal. Assume identical noise PSD No on each branch and pulse 
shaping such that BT S = 1. Then each branch has the same SNR 7 j = E s /Nq. Let us set a* = ri/y/No: we will 
see later that these weights arc optimal for maximal-ratio combining in fading. Then the received SNR is 

_ (e ",^/) 2 _ (E?i £.) 2 _ me, 

Jv„E"i«, 2 «oE",£. N » ' <7,1> 
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Thus, in the absence of fading, with appropriate weighting there is an M - fold increase in SNR due to the coherent 
combining of the M signals received from the different antennas. This SNR increase in the absence of fading is 
refered to as the array gain. More precisely, array gain A n is defined as the increase in averaged combined SNR 
7 S over the average branch SNR 7: 

A - 5s 
7 

Array gain occurs for all diversity combining techniques, but is most pronounced in MRC. Both diversity and 
array gain occur in transmit diversity as well. The array gain allows a system with multiple transmit or receive 
antennas in a fading channel to achieve better performance than a system without diversity in an AW GN channel 
with the same average SNR. We will see this effect in performance curves for MRC and EGC with a large number 
of antennas. 

In fading the combining of multiple independent fading paths leads to a more favorable distribution for 7 s 
than would be the case with just a single path. In particular, the performance of a diversity system, whether it uses 
space diversity or another form of diversity, in terms of P s and P ou t is as defined in Sections AveErrorProb-6.3. 1: 



P, = 



Psh)P'f S (7)d7, 



(7.2) 



where P s (7) is the probability of symbol error for demodulation of s(t) in AWGN with SNR 72, and 



out 



no 

= p{ 7 E < 7o) = / I 

Jo 



P'Yzi7)d7, 



(7.3) 



for some target SNR value 70. The more favorable distribution for 7s leads to a decrease in P s and P ou t due to 
diversity combining, and the resulting performance advantage is called the diversity gain. In particular, for some 
diversity systems their average probability of error can be expressed in the form P s = (Pf A/ where c is a constant 
that depends on the specific modulation and coding, 7 is the average received SNR per branch, and M is called 
the diversity order of the system. The diversity order indicates how the slope of the average probability of error 
as a function of average SNR changes with diversity. Figures 7.3 and 7.6 below show these slope changes as a 
function of M for different combining techniques. Recall from (6.61) that a general approximation for average 
error probability in Rayleigh fading with no diversity is P s « o;m/(2/?m 7)- This expression has a diversity order 
of one, consistent with a single receive antenna. The maximum diversity order of a system with M antennas is M, 
and when the diversity order equals M the system is said to achieve full diversity order. 

In the following subsections we will describe the different combining techniques and their performance in 
more detail. These techniques entail various tradeoffs between performance and complexity. 



7.2.2 Selection Combining 

In selection combining (SC), the combiner outputs the signal on the branch with the highest SNR rf/Ni. This is 
equivalent to choosing the branch with the highest r\ + N, t if the noise power Ay = A is the same on all branches 
1 . Since only one branch is used at a time, SC often requires just one receiver that is switched into the active 
antenna branch. However, a dedicated receiver on each antenna branch may be needed for systems that transmit 
continuously in order to simultaneously and continuously monitor SNR on each branch. With SC the path output 
from the combiner has an SNR equal to the maximum SNR of all the branches. Moreover, since only one branch 
output is used, co-phasing of multiple branches is not required, so this technique can be used with either coherent 
or differential modulation. 

'in practice rf + IV, is easier to measure than SNR since it just entails finding the total power in the received signal. 
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For M branch diversity, the CDF of 72 is given by 



M 

P-ysil) = F(7S < 7 ) = p(max[ 7 i, 72 , . . . , 7 m] < 7 ) = < 7)- C 7 - 4 ) 

i = 1 

We obtain the pdf of 7s by differentiating P 7s (7) relative to 7, and the outage probability by evaluating P 7s (7) 
at 7 = 70. Assume that we have M branches with uncorrelated Rayleigh fading amplitudes r t . The instantaneous 
SNR on the ith branch is therefore given by 7 $ = r? /TV. Defining the average SNR on the ith branch as 7 i = E[ 7 i], 
the SNR distribution will be exponential: 

p(7i) = ^e _7i/7i . (7.5) 

7i 

From (6.47), the outage probability for a target 70 on the ith branch in Rayleigh fading is 

Pout(7o) = 1 - e ~ 7o/7 C (7.6) 



The outage probability of the selection-combiner for the target 70 is then 

M M 



Pout( 70) = < 70) = n 



1 — e 



-iohi 



2= 1 2=1 

If the average SNR for all of the branches arc the same (7, = 7 for all i), then this reduces to 



Pout{ 7o) = f(7s < 7o) = 

Differentiating (7.8) relative to 70 yields the pdf for 7s : 



1 _ g — 70/7 



-1 M 



M 

F7J7) = = L 



1 - e “ 7 / 7 



M— 1 



,- 7/7 



From (7.9) we see that the average SNR of the combiner output in i.i.d. Rayleigh fading is 



7e = 



7F7e(7)^7 



= 7 



/•°° 7M 
lo 7 

M 

El 

2=1 



1 — e 



-7/7 



-1 M— 1 



- 7 / 7 d' 



7 



(7.7) 



(7.8) 



(7.9) 



(7.10) 



Thus, the average SNR gain increases with M , but not linearly. The biggest gain is obtained by going from no 
diversity to two-branch diversity. Increasing the number of diversity branches from two to three will give much less 
gain than going from one to two, and in general increasing M yields diminishing returns in terms of the SNR gain. 
This trend is also illustrated in Figure 7.2, which shows P out versus 7/70 for different M in i.i.d. Rayleigh fading. 
We see that there is dramatic improvement even with just two-branch selection combining: going from Af = 1 to 
M = 2 at 1% outage probability there is an approximate 12 dB reduction in required SNR, and at .01% outage 
probability there is an approximate 20 dB reduction in required SNR. However, at .01% outage, going from two- 
branch to three-branch diversity results in an additional reduction of approximately 7 dB, and from three-branch to 
four-branch results in an additional reduction of approximately 4 dB. Clearly the power savings is most substantial 
going from no diversity to two-branch diversity, with diminishing returns as the number of branches is increased. 
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Figure 7.2: Outage Probability of Selection Combining in Rayleigh Fading. 



It should be noted also that even with Rayleigh fading on all branches, the distribution of the combiner output SNR 
is no longer Rayleigh. 



Example 7.1: Find the outage probability of BPSK modulation at = 10 3 for a Rayleigh fading channel with 
SC diversity for M = 1 (no diversity), M = 2, and M = 3. Assume equal branch SNRs of 7 = 15 dB. 

Solution: A BPSK modulated signal with 7^ = 7 dB has Pi, = 10“ 3 . Thus, we have 70 = 7 dB. Substituting 
70 = 10 ' and 7 = 10 15 into (7.8) yields P out = .1466 for M = 1, P out = .0215 for M = 2, and P out = .0031 
for M = 2. We see that each additional branch reduces outage probability by almost an order of magnitude. 



The average probability of symbol error is obtained from (7.2) with P s ( 7 ) the probability of symbol error 
in AW GN for the signal modulation and p 7s (7) the distribution of the combiner SNR. For most fading distribu- 
tions and coherent modulations, this result cannot be obtained in closed-form and must be evaluated numerically 
or by approximation. We plot Pi, versus in i.i.d. Rayleigh fading, obtained by a numerical evaluation of 
f Q (\/ 2p ) p - y s (7) for p 7s (7) given by (7.9), in Figure 7.3. Note that in this figure the diversity system for M > 8 
has a lower error probability than an AW GN channel with the same SNR due to the array gain of the combiner. 
The same will be true for MRC and EGC performance. Closed-form results do exist for differential modulation 
under i.i.d. Rayleigh fading on each branch [4, Chapter 6 . 1] [1 , Chapter 9.7]. For example, it can be shown that for 
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DPSK with p 7 E ( 7 ) given by (7.9) the average probability of symbol error is given by 



Pb 




1 

2 



e 7 P 7 s(7)^7 



M 



M - 1 



2>D’ 



m=0 



M - 1 
bn 

1 + m + 7 



(7.11) 




Figure 7.3: Pj, of BPSK under SC with i.i.d. Rayleigh Fading. 

In the above derivations we assume that there is no correlation between the branch amplitudes. If the correla- 
tion is nonzero, then there is a slight degradation in performance which is almost negligible for correlations below 
0.5. Derivation of the exact performance degradation due to branch correlation can be found in [1, Chapter 9.7] [2]. 

7.2.3 Threshold Combining 

SC for systems that transmit continuously may require a dedicated receiver on each branch to continuously monitor 
branch SNR. A simpler type of combining, called threshold combining, avoids the need for a dedicated receiver on 
each branch by scanning each of the branches in sequential order and outputting the first signal with SNR above a 
given threshold 7 ^. As in SC, since only one branch output is used at a time, co-phasing is not required. Thus, this 
technique can be used with either coherent or differential modulation. 

Once a branch is chosen, as long as the SNR on that branch remains above the desired threshold, the combiner 
outputs that signal. If the SNR on the selected branch falls below the threshold, the combiner switches to another 
branch. There arc several criteria the combiner can use to decide which branch to switch to [5]. The simplest 
criterion is to switch randomly to another branch. With only two-branch diversity this is equivalent to switching 
to the other branch when the SNR on the active branch falls below 7 t- This method is called switch and stay 
combining (SSC). The switching process and SNR associated with SSC is illustrated in Figure 7.4. Since the SSC 
does not select the branch with the highest SNR, its performance is between that of no diversity and ideal SC. 
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Figure 7.4: SNR of SSC Technique. 



Let us denote the SNR on the ith branch by 7, and the SNR of the combiner output by 7s. The CDF of 7s 
will depend on the threshold level 7 t and the CDF of 7 For two-branch diversity with i.i.d. branch statistics the 
CDF of the combiner output P 7E (7) = 7(77 < 7) can be expressed in terms of the CDF P li (7) = 7(7, < 7) and 
pdf p 7i (7) of the individual branch SNRs as 






P 



71 (Tr) ^72(7) 

/ _ ^ ^ _ A 



7 < 7T 



For 



- ,L\ I- ^ 72VD 

„ p(7T < 71 < 7) + 7 > 7T- 

Rayleigh fading in each branch with 7 j = 7, i = 1 , 2 this yields 

f 1 - - e" 7 /7 + 6 -(7t+7)/7 7 < 7t 

7sV 7) — | ^ _ 2e“'»'/7 4 . 6 -(7t+7)/7 7 > 7t _ 

The outage probability P out associated with a given 70 is obtained by evaluating P 7E (7) 

( l — p -7t/ 7 _ p-7o/7 _L p ~(7t+7o)/ 7 -v n < 



(7.12) 



(7.13) 



(7.14) 



The performance of SSC under other types of fading, as well as the effects of fading correlation, is studied in 
[1, Chapter 9. 8], [6, 7]. In particular, it is shown in [1, Chapter 9.8] that for any fading distribution, SSC with an 
optimized threshold has the same outage probability as SC. 



Example 7.2: Find the outage probability of BPSK modulation at P& = 10” 3 for two-branch SSC diversity with 
i.i.d. Rayleigh fading on each branch for threshold values of 77- = 3, 7, and 10 dB. Assume the average branch 
SNR is 7 = 15 dB. Discuss how the outage proability changes with 7 t- Also compare outage probability under 
SSC with that of SC and no diversity from Example 7.1. 

Solution: As in Example 7.1, we have 70 = 7 dB. For 7 t = 5 dB, 70 > 7 t, so we use the second line of (7.14) to 
get 

P out = 1 - 2e- 10 ' 7 / 101 - 5 + e -( 10 ' 5 + 10l ' 5 )/ 10l s = .0654. 

For 7 t = 7 dB, 70 = 7 t, so we again use the second line of (7.14) to get 

P out = 1 - 2e- 10 ' 7 / 101 - 5 + e-lio^+io 1 ' 6 )/! 01 - 5 = .0215. 
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For 7 t = 10 dB, 70 < 7 7 , so we use the first line of (7.14) to get 

P out = 1 _ e-io/io 1 - 5 _ e-to-Vio 1 ' 5 + _ e -(io+io ^)/io^ = 0397 

We see that the outage probability is smaller for 77 = 7 dB than for the other two values. At 77 = 5 dB the 
threshold is too low, so the active branch can be below the target 70 for a long time before a switch is made, which 
contributes to a large outage probability. At 7 7 = 10 dB the threshold is too high: the active branch will often fall 
below this threshold value, which will cause the combiner to switch to the other antenna even though that other 
antenna may have a lower SNR than the active one. This example indicates that the threshold 77 that minimizes 
Pout is typically close to the target 70 . 

From Example 7.1, SC has P ou t = .0215. Thus, 7 * = 7 dB is the optimal threshold where SSC performs the 
same as SC. We also see that performance with an unoptimized threshold can be much worse than SC. However, 
the performance of SSC under all three thresholds is better than the performance without diversity, derived as 
Pout = -1466 in Example 7.1. 



We obtain the pdf of 77 by differentiating (7. 12) relative to 7. Then the average probability of error is obtained 
from (7.2) with P s ( 7 ) the probability of symbol error in AWGN and p 7 s ( 7 ) the pdf of the SSC output SNR. For 
most fading distributions and coherent modulations, this result cannot be obtained in closed-form and must be 
evaluated numerically or by approximation. However, for i.i.d. Rayleigh fading we can differentiate (7.13) to get 



F7s(7) 



(l — e 7T / 7 ) =e 7 / 7 7 < 77 

(2 — e _7T / 7 ) 2 g -7/7 7 > 77. 



(7.15) 



As with SC, for most fading distributions and coherent modulations, the resulting average probability of error 
is not in closed-form and must be evaluated numerically. However, closed-form results do exist for differential 
modulation under i.i.d. Rayleigh fading on each branch. In particular, the average probability of symbol error for 
DPSK is given by 

Pb = J I g e ~^P-/sh)d'f = + i) V 1 “ e ~ lTh + e~ 7T e" 7T/7 .J (7.16) 



Example 7.3: Find the average probability of error for DPSK modulation under two-branch SSC diversity with 
i.i.d. Rayleigh fading on each branch for threshold values of 77 = 5, 7, and 10 dB. Assume the average branch 
SNR is 7 = 15 dB. Discuss how the average proability of error changes with 77 . Also compare average error 
probability under SSC with that of SC and with no diversity. 

Solution: Evaluating (7.16) with 7 = 15 dB and 77 = 3, 7, and 10 dB yields, respectively. Pi, = .0029, Pi, = 
.0023, Pb = .0042. As in the previous example, there is an optimal threshold that minimizes average probability 
of error. Setting the threshold too high or too low degrades performance. From (7.11) we have that with SC, 
Pb = -5(1 + lO 1 - 5 )^ — .5(2 + 10 1 ' 5 ) -1 = 4.56 • 10 -4 , which is roughly an order of magnitude less than with 
SSC and an optimized threshold. With no diversity, Pb = .5(1 + 10 1 ' 5 ) -1 = .0153, which is roughly an order of 
magnitude worse than with two-branch SSC. 
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7.2.4 Maximal Ratio Combining 



In SC and SSC. the output of the combiner equals the signal on one of the branches. In maximal ratio combining 
(MRC) the output is a weighted sum of all branches, so the a ,s in Figure 7.1 arc all nonzero. Since the signals arc 
cophased, on = u,fP :l0t , where 0, is the phase of the incoming signal on the ith branch. Thus, the envelope of the 
combiner output will be r = YltLi a i r i- Assuming the same noise PSD Nq in each branch yields a total noise PSD 
N to t at the combiner output of N tot = a l Aq- Thus, the output SNR of the combiner is 



7s 




1 2 

N ° E"i»i 



(7.17) 



The goal is to chose the o^s to maximize 7 £. Intuitively, branches with a high SNR should be weighted more 
than branches with a low SNR, so the weights af should be proportional to the branch SNRs rf /Nq. We find the 
a,s that maximize 7s by taking partial derivatives of (7.17) or using the Swartz inequality [2]. Solving for the 
optimal weights yields a? = rj /No, and the resulting combiner SNR becomes 7s = Yi=i r i /N 0 = Yi=i li- 
Thus, the SNR of the combiner output is the sum of SNRs on each branch. The average combiner SNR increases 
linearly with the number of diversity branches M , in contrast to the diminishing returns associated with the average 
combiner SNR in SC given by (7.10). As with SC, even with Rayleigh fading on all branches, the distribution of 
the combiner output SNR is no longer Rayleigh. 

To obtain the distribution of 7s we take the product of the exponential moment generating or characteristic 
functions. Assuming i.i.d. Rayleigh fading on each branch with equal average branch SNR 7, the distribution of 
7s is chi-squared with 2M degrees of freedom, expected value 7 S = A/ 7, and variance 2A/7: 



ryM - lg — 7/7 

° - nr 7 -°- 



(7.18) 



The corresponding outage probability for a given threshold 70 is given by 



Pout = p{ 7E < 7o) 



C70 



M 



Pi*{n)dl = 1 - e 70 



k= 1 



( 7 o / 7) fc - 1 

(k-iy. ' 



(7.19) 



Figure 7.5 plots P ou t for maximal ratio combining indexed by the number of diversity branches. 

The average probability of symbol error is obtained from (7.2) with P s ( 7) the probability of symbol error 
in AWGN for the signal modulation and p 7s ( 7) the pdf of 7s. For BPSK modulation with i.i.d Rayleigh fading, 
where p 7E ( 7) is given by (7.18), it can be shown that [4, Chapter 6.3] 



Pb = 



Q(V / 27)P 7E (7)^7 = 



1 - r 



M M—l 

E 

m = 0 



M — l + m 
m 



i + r 



(7.20) 



where T = y/y/fl + 7). This equation is plotted in Figure 7.6. Comparing the outage probability for MRC in 
Figure 7.5 with that of SC in Figure 7.2 or the average probability of error for MRC in Figure 7.6 with that of 
SC in Figure 7.3 indicates that MRC has significantly better performance than SC. In Section 7.4 we will use a 
different analysis based on MGFs to compute average error probability under MRC, which can be applied to any 
modulation type, any number of diversity branches, and any fading distribution on the different branches. 

We can obtain a simple upper bound on the average probability of error by applying the Chernoff bound 
Q(x) < e~ x //2 to the Q function. Recall that for static channel gains with MRC, we can approximate the proba- 
bility of error as 

Ps = a M Q(VP M 7s) < a M e-^ /2 = a M e-^ (71+ - +7M)/2 . (7.21) 
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Figure 7.5: P ou t for MRC with i.i.d. Rayleigh fading. 



Integrating over the chi-squared distribution for 7 s yields 



M 



Ps < CtM 



l = \ i + PMii/2' 



(7.22) 



In the limit of high SNR and assuming that the 7 j’s arc identically distributed with 7 ,; = 7 this yields 

-M 



P, 



( Pm tV 
° M ( ~ 2 ~) 



(7.23) 



Thus, at high SNR, the diversity order of MRC is M, the number of antennas, and so MRC achieves full diversity 
order. 



7.2.5 Equal-Gain Combining 



MRC requires knowledge of the time-varying SNR on each branch, which can be very difficult to measure. A 
simpler technique is equal-gain combining, which co-phases the signals on each branch and then combines them 
with equal weighting, a t = 7 °\ The SNR of the combiner output, assuming equal noise PSD Nq in each branch, 
is then given by 




(7.24) 
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Figure 7.6: P& for MRC with i.i.d. Rayleigh fading. 

The pdf and CDF of 7s do not exist in closed-form. For i.i.d. Rayleigh fading and two-branch diversity and 
average branch SNR 7, an expression for the CDF in terms of the Q function can be derived as [8, Chapter 5.6] [4, 
Chapter 6.4] 

P 7e ( 7 ) = 1 - e- 2 T/7 /^ e -7/7 (1 _ 2 Q (v^tTt)) • (7.25) 

The resulting outage probability is given by 

Pout ("to) = 1 - e- 27 « - v^7fie“ 7/i (l - 2 Q (vW)) , (7-26) 

where 7^ = 70/7. Differentiating (7.25) relative to 7 yields the pdf 

p 7E ( 7 ) = (l - 2Q( v / 2^)) . (7.27) 

Substituting this into (7.2) for BPSK yields the average probability of bit error 



It is shown in [8, Chapter 5.7] that performance of EGC is quite close to that of MRC, typically exhibiting less than 
1 dB of power penalty. This is the price paid for the reduced complexity of using equal gains. A more extensive 
performance comparison between SC, MRC, and EGC can be found in [1, Chapter 9]. 



Ph = 



Q( v / 27)f 7 e (7)^7 = -5 
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Example 7.4: Compare the average probability of bit error of BPSK under MRC and EGC two-branch diversity 
with i.i.d. Rayleigh fading with average SNR of 10 dB on each branch. 



Solution: From (7.20), under MRC we have 

P b = ^ ~ ^ 1Q/11 j (2 + y/lO/n) = 1-60 • 10~ 3 . 

From (7.28), under EGC we have 




So we see that the performance of MRC and EGC are almost the same. 



7.3 Transmitter Diversity 

In transmit diversity there arc multiple transmit antennas with the transmit power divided among these antennas. 
Transmit diversity is desirable in systems such as cellular systems where more space, power, and processing 
capability is available on the transmit side versus the receive side. Transmit diversity design depends on whether or 
not the complex channel gain is known at the transmitter or not. When this gain is known, the system is very similar 
to receiver diversity. However, without this channel knowledge, transmit diversity gain requires a combination of 
space and time diversity via a novel technique called the Alamouti scheme. We now discuss transmit diversity 
under the different assumptions about channel knowledge at the transmitter, assuming the channel gains are known 
at the receiver. 



7.3.1 Channel Known at Transmitter 



Consider a transmit diversity system with M transmit antennas and one receive antenna. We assume the path gain 
associated with the ith antenna given by ne^ di is known at the transmitter. This is refered to as having channel side 
information (CSI) at the transmitter, or CSIT. Let sit) denote the transmitted signal with total energy per symbol 
E s . This signal is multiplied by a complex gain o-,. = . 0 < a t < 1 and sent through the ith antenna. 

This complex multiplication performs both co-phasing and weighting relative to the channel gains. Due to the 
average total energy constraint E s , the weights must satisfy Ylt=i of = 1- The weighted signals transmitted over 
all antennas are added “in the air”, which leads to a received signal given by 

M 

r(t) ='^2a i r i s(t). (7.29) 

i = 1 



Let Nq denote the noise PSD in the receiver. 

Suppose we wish to set the branch weights to maximize received SNR. Using a similar analysis as in receiver 
MRC diversity, we see that the weights a, that achieve the maximum SNR arc given by 



o i — 




(7.30) 
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and the resulting SNR is 



M 



M 

£ 

7=1 



(7.31) 



for 7i = t 2 E s /Nq equal to the branch SNR between the 7th transmit antenna and the receive antenna. Thus we 
see that transmit diversity when the channel gains are known at the transmitter is very si mi lar to receiver diversity 
with MRC: the received SNR is the sum of SNRs on each of the individual branches. In particular, if all antennas 
have the same gain r t = r, 7s = Mt 2 E s /Nq, and M-fold increase over just a single antenna transmitting with 
full power. Using the Chernoff bound, we see that for static gains 



Ps = cxmQ(VPm^) < a M e-^ /2 = a M e^ M(71+ - +7M)/2 . (7.32) 



Integrating over the chi-squared distribution for 7s yields 



M 



Ps < PI 



f = \ 1 + Pm 7i/2 

In the limit of high SNR and assuming that the 7 j are identically distributed with 7 ?: = 7 this yields 



(7.33) 



p _ ( Pmi\ 



—\ -M 



(7.34) 



Thus, at high SNR, the diversity order of transmit diversity with MRC is M, so MRC achieves full diversity order. 
However, the performance of transmit diversity is worse than receive diversity due to the extra factor of M in the 
denominator of (7.34), which results from having to divide the transmit power among all the transmit antennas. 
Receiver diversity collects energy from all receive antennas, so it does not have this penalty. The analysis for EGC 
and SC assuming transmitter channel knowledge is the same as under receiver diversity, except that the transmit 
power must be divided among all transmit antennas. 

The complication of trans mi t diversity is to obtain the channel phase and, for SC and MRC, the channel gain, 
at the transmitter. These channel values can be measured at the receiver using a pilot technique and then fed back 
to the transmitter. Alternatively, in cellular systems with time-division, the base station can measure the channel 
gain and phase on transmissions from the mobile to the base, and then use these measurements in transmitting back 
to the mobile, since under time-division the forward and reverse links are reciprocal. 



7.3.2 Channel Unknown at Transmitter - The Alamouti Scheme 

We now consider the same model as in the previous subsection but assume that the transmitter no longer knows 
the channel gains 77 so there is no CSIT. In this case it is not obvious how to obtain diversity gain. Consider, 
for example, a naive strategy whereby for a two-antenna system we divide the transmit energy equally between 
the two antennas. Thus, the transmit signal on antenna i will be spt) = \/P5 s(t) for -s(t) the transmit signal with 
energy per symbol E s . Assume each antenna has a complex Gaussian channel gain h , = , i = 1.2 with mean 

zero and variance one. The received signal is then 

r(t) = \/i5(/ii + h, 2 )s(t). (7.35) 

Note that h\ + h -2 is the sum of two complex Gaussian random variables, and is thus a complex Gaussian as well 
with mean equal to the sum of means (zero) and variance equal to the sum of variances (2). Thus \/ r p)(h\ + ho ) 
is a complex Gaussian random variable with mean zero and variance one, so the received signal has the same 
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distribution as if we had just used one antenna with the full energy per symbol. In other words, we have obtained 
no performance advantage from the two antennas, since we could not divide our energy intelligently between them 
or obtain coherent combining through co-phasing. 

Transmit diversity gain can be obtained even in the absence of channel information with an appropriate scheme 
to exploit the antennas. A particularly simple and prevalent scheme for this diversity that combines both space and 
time diversity was developed by Alamouti in [9], Alamouti’s scheme is designed for a digital communication 
system with two-antenna transmit diversity. The scheme works over two symbol periods where it is assumed that 
the channel gain is constant over this time. Over the first symbol period two different symbols s 1 and s 2 each with 
energy E s /2 arc transmitted simultaneously from antennas 1 and 2, respectively. Over the next symbol period 
symbol —s 2 is transmitted from antenna 1 and symbol s* is transmitted from antenna 2, each with symbol energy 

E s / 2. 

Assume complex channel gains hi = rje-^% f = 1,2 between the / th transmit antenna and the receive antenna. 
The received symbol over the first symbol period is y\ = h\S\ + h 2 s 2 +n\ and the received symbol over the second 
symbol period is y 2 = —h\s\ + h 2 s\ + n 2 , where m, i = 1, 2 is the AWGN noise sample at the receiver associated 
with the 2th symbol transmission. We assume the noise sample has mean zero and power N. 

The receiver uses these sequentially received symbols to form the vector y = [yiy 2 ] 7 given by 



■ hi 


h 2 




Sl 


+ 


ni 


hi 


-K _ 




s 2 _ 


>k 

L n 2 J 



where s = [S 1 S 2 ] 71 , n = [niri 2 ] T , and 



H a = 



h± h 2 
h* 2 ~h\ 



Has + n, 



Let us define the new vector z = y. The structure of Ha implies that 



H^H A = (\h 2 1 \ + \h 2 2 \)I 2 , (7.36) 

is diagonal, and thus 

Z = [zi Z 2 ] T = {\h \ I + |/i||)I 2 s + n, (7.37) 

where n = H f [n is a complex Gaussian noise vector with mean zero and covariance matrix 77 f fm*] = (\h\\ + 
I ^ 2 1 ) 2 VI 2 The diagonal nature of z effectively decouples the two symbol transmissions, so that each component of 
z corresponds to one of the transmitted symbols: 



Zi — ( | hf | + | h 2 1 ) Si + hi , 2 — 1,2. 



(7.38) 



The received SNR thus corresponds to the SNR for Zi given by 

= m\+M)E. 

ll 2 Nq ’ { ’ 

where the factor of 2 comes from the fact that s l is transmitted using half the total symbol energy E s . The received 
SNR is thus equal to the sum of SNRs on each branch, identical to the case of transmit diversity with MRC 
assuming that the channel gains arc known at the transmitter. Thus, the Alamouti scheme achieves a diversity 
order of 2, the maximum possible for a two-antenna transmit system, despite the fact that channel knowledge is 
not available at the transmitter. However, it only achieves an array gain of 1 , whereas MRC can achieve an array 
gain and a diversity gain of 2. The Alamouti scheme can be generalized for M > 2 when the constellations arc 
real, but if the conte llations arc complex the generalization is only possible with a reduction in code rates [10]. 
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7.4 Moment Generating Functions in Diversity Analysis 



In this section we use the MGFs introduced in Section 6.3.3 to greatly simplify the analysis of average error proba- 
bility under diversity. The use of MGFs in diversity analysis arises from the difficulty in computing the pdf p 7E (7) 
of the combiner SNR 7s. Specifically, although the average probability of error and outage probability associated 
with diversity combining arc given by the simple formulas (7.2) and (7.3), these formulas require integration over 
the distribution p 7E (7) . This distribution is often not in closed-form for an arbitrary number of diversity branches 
with different fading distributions on each branch, regardless of the combining technique that is used. The pdf for 
p 7s ( 7) is often in the form of an infinite -range integral, in which case the expressions for (7.2) and (7.3) become 
double integrals that can be difficult to evaluate numerically. Even when p 7E (7) is in closed form, the correspond- 
ing integrals (7.2) and (7.3) may not lead to closed-form solutions and may be difficult to evaluate numerically. 
A large body of work over many decades has addressed approximations and numerical techniques to compute the 
integrals associated with average probability of symbol error for different modulations, fading distributions, and 
combining techniques (see [11] and the references therein). Expressing the average error probability in terms of 
the MGF for 7s instead of its pdf often eliminates these integration difficulties. Specifically, when the diversity 
fading paths that arc independent but not necessarily identically distributed, the average error probability based on 
the MGF of 7s is typically in closed-form or consists of a single finite -range integral that can be easily computed 
numerically. 

The simplest application of MGFs in diversity analysis is for coherent modulation with MRC, so this is treated 
first. We then discuss the use of MGFs in the analysis of average error probability under EGC and SC. 

7.4.1 Diversity Analysis for MRC 

The simplicity of using MGFs in the analysis of MRC stems from the fact that, as derived in Section 7.2.4, the 
combiner SNR 7s is the sum of theyj’s, the branch SNRS: 

M 

7£ ( 7 - 4 °) 

7=1 

As in the analysis of average error probability without diversity (Section 6.3.3), let us again assume that the 
probability of error in AWGN for the modulation of interest can be expressed either as an exponential function of 
7 S , as in (6.67), or as a finite range integral of such a function, as in (6.68). 

We first consider the case where P s is in the form of (6.67). Then the average probability of symbol error 
under MRC is 

/»oo 

P s = ciexp[-c 2 7]p 7E (7)<i7. (7.41) 

Jo 

We assume that the branch SNRs are independent, so that their joint pdf becomes a product of the individual pdfs: 
P7 i,..., 7 m(ti> • • • , 7 m) = 771(71) • • -P7 m( 7 m). Using this factorization and substituting 7 = 71 + . . . + 7M in 
(7.41) yields 

exp [—02(71 + . . . + 7m)]p 7 i ( 71) • • • P'im ( iM)d'yi • • . dq M - (7.42) 

M— fold 

Now using the product forms exp[-/3(7i+. . .+7 m)] = Il^i exp[-/?7*] andp 7l (7 i) . . .p 1m (jm) = YliiiPnili) 

in (7.42) yields 

roo r 00 /»oo M 

P s = ci / ••• / TTexp[-C27i]p 7i (7i)d7i- (7.43) 

7° l£ v 7° , ;=i 

AT— fold 
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Finally, switching the order of integration and multiplication in (7.43) yields our desired final form 



P, = 



ct n / ex P[~ c 27ib7i = C 1 n M 7i (“ c 2) • 
i=l*'° i=l 



(7.44) 



Thus, the average probability of symbol error is just the product of MGFs associated with the SNR on each branch. 
Similary, when P s is in the form of (6.68), we get 



f* oo rB 



p, = 



1 0 J A 



Cl exp[-c 2 (z)7]d,Tp 7E (7)<i7 = 



io Jo 



/ o Ja 



b M 

ci JJexp[-c 2 (x)7i]p 7i (7i)(i7i. (7.45) 

i = 1 



M— fold 



Again switching the order of integration and multiplication yields our desired final form 

r-B M /*oo r-B M 

P s = ci TT / exp[-c 2 (x)7j]p 7i (7i)d7i = ci / TT M li {-c 2 {x))dx. (7.46) 

J A i=1 7o 7A j =1 

Thus, the average probability of symbol error is just a single finite -range integral of the product of MGFs associated 
with the SNR on each branch. The simplicity of (7.44) and (7.46) arc quite re mark able, given that these expressions 
apply for any number of diversity branches and any type of fading distribution on each branch, as long as the branch 
SNRs arc independent. 

We now apply these general results to specific modulations and fading distributions. Let us first consider 
DPSK, where Ph("/b) = .5e -76 in AWGN is in the form of (6.67) with ci = 1/2 and o> = 1. Thus, from (7.44), 
the average probability of bit error in DPSK under M-fold MRC diversity is 

1 M 

P b = - (7.47) 

Z i=l 

where M- h (s) is the MGF of the fading distribution for the 7 1 h diversity branch, given by (6.63), (6.64), and (6.65) 
for, respectively, Rayleigh, Ricean, and Nakagami fading. Note that this reduces to the probability of average bit 
error without diversity given by (6.60) for M = 1. 



Example 7.5: Compute the average probability of bit error for DPSK modulation under three-branch MRC as- 
suming i.i.d. Rayleigh fading in each branch with 77 = 15 dB and y 2 = 73 = 5 dB. Compare with the case of no 
diversity with 7 = 15 dB. 

Solution: From (6.63), M 7i (s) = (1 — ) 1 Using this MGF in (7.47) with s = — 1 yields 

— 1 1 / 1 \ 2 4 
Pb = nr = 8.85 x 10~ 4 . 

2 1 + 10 L5 VI + 10 5 J 



With no diversity we have 



Pb 



1 

2(1 + 10 1 - 5 ) 



= 1.53 x 10” 2 . 



This indicates that additional diversity branches can significantly reduce average BER, even when the SNR on this 
branches is somewhat low. 
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Example 7.6: Compute the average probability of bit error for DPSK modulation under three-branch MRC as- 
suming Nakagami fading in the first branch with rn = 2 and 77 = 15 dB, Ricean fading in the second branch with 
K = 3 and 7 2 = 5 dB, and Nakagami fading in the third branch with m = 4 and 73 = 5 dB. Compare with the 
results of the prior example. 

Solution: From (6.64) and (6.65), for Nakagami fading M 7i (s) = (1 — s 7 y i /m)~ m and for Riciean fading 



M 7s ( S ) = 



1 + K 



1 + K — S'-fg 
Using these MGFs in (7.47) with s = —1 yields 

1 \ 2 4 



— exp 



Ks-f s 

1 + K — s+g 



Pb = \ 



l + 10 L5 /2 J 4 + 10- 5 



exp[— 3 • 10' 5 / (4 + lO' 5 )] 



• 5 \ 



1 + 10- 5 /4 



= 6.9 • 10 



-5 



which is more than an order of magnitude lower than the average error probability under i.i.d. Rayleigh fading 
with the same branch SNRs derived in the previous problem. This indicates that Nakagami and Ricean fading arc 
a much more benign distributions than Rayleigh, especially when multiple branches arc combined under MRC. 
This example also illustrates the power of the MGF approach: computing average probability of error when the 
branch SNRs follow different distributions just consists of multiplying together different functions in closed-form, 
whose result is then also in closed-form. Computing the pdf of the sum of random variables from different families 
involves the convolution of their pdfs, which rarely leads to a closed-form pdf. 



For BPSK we see from (6.44) that I-\ has the same form as (6.68) with the integration over 7 where c\ = 1/7 r, 
,4 = 0, B = tt/ 2, and (7(7) = 1/ sin 2 7- Thus we obtain the average bit error probability for BPSK with M- fold 
diversity as 



Pb = 





(7.48) 



Similarly, if P s = aQ{^/2fpf~ s ) then P s has the same form as (6.68) with integration over <j>, c\ = 1 / 7r, A = 0, 
B = 7r/2, and 02(7) = 5/sin 2 7, and the resulting average symbol error probability with M- -fold diversity is given 

by 



p 



S — 





(7.49) 



If the branch SNRs arc i.i.d. then this simplifies to 



P 



s — 






(7.50) 



where AP,(s) is the common MGF for the branch SNRs. The probability of symbol error for MPSK in (6.45) is 
also in the form (6.68), leading to average symbol error probability 



Ps = 





(7.51) 
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where g = sin 2 (jj)- For i.i.d. fading this simplifies to 



(M— 1)tt 

Ps=- 

7T do 



m 7 ( — 



sin 



M 



d</>. 



(7.52) 



Example 7.7: Find an expression for the average symbol error probability for 8PSK modulation for two-branch 
MRC combining, where each branch is Rayleigh fading with average SNR of 20 dB. 

Solution: The MGF for Rayleigh is M 7i (s) = (1 — s 7 y i )~ 1 . Using this MGF in (7.52) with s = — sin 2 ir/8/ sin 2 (j) 
and 7 = 100 yields 

_ i r7n/8 



This expression does not lead to a closed-form solution and so must be evaluated numerically, which results in 
P s = 1.56 • 10 -3 . 



1 + 



1 

100 sin 2 n/8 



d(f>. 



We can use si mi lar techniques to extend the derivation of the exact error probability for MQAM in fading, 
given by (7.53), to include MRC diversity. Specifically, we first integrate the expression for P s in AWGN, ex- 
pressed in (6.80) using the alternate representation of Q and Q 2 , over the distribution of 7s. Since 7s = Yli 7 i 
and the SNRs are independent, the exponential function and distribution in the resulting expression can be written 
in product form. Then we use the same reordering of integration and multiplication used above in the MPSK 
derivation. The resulting average probability of symbol error for MQAM modulation with MRC combining is 
given by 



Ps 










d(f>. (7.53) 



More details on the use of MGFs to obtain average probability of error under M -fold MRC diversity for a broad 
class of modulations can be found in [10, Chapter 9.2]. 



7.4.2 Diversity Analysis for EGC and SC 

MGFs are less useful in the analysis of EGC and SC than in MRC. The reason is that with MRC, 7s = Yli 7 i> 
so exp[— C27s] = FI,: ex P[ — c 27«] This factorization leads directly to the simple formulas whereby probability of 
symbol error is based on a product of MGFs associated with each of the branch SNRs. Unfortunately, neither EGC 
nor SC leads to this type of factorization. However, working with the MGF of 7s can sometimes lead to simpler 
results than working directly with its pdf. This is illustrated in [1, Chapter 9.3.3], where the exact probability of 
symbol error for MPSK is obtained based on the characteristic function associated with each branch SNR, where 
the characteristic function is just the MGF evaluated at s = j‘2irf, i.e. it is the Fourier transform of the pdf. The 
resulting average error probability, given by [10, Equation 9.78], is a finite-range integral over a sum of closed-form 
expressions, and is thus easily evaluated numerically. 
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7.4.3 Diversity Analysis for Noncoherent and Differentially Coherent Modulation 

A si mi lar approach to determining the average symbol error probability of noncoherent and differentially coherent 
modulations with diversity combining is presented in [12, 10]. This approach differs from that of the coherent 
modulation case in that it relies on an alternate form of the Marcum Q-function instead of the Gaussian Q-function, 
since the BER of noncoherent and differentially coherent modulations in AWGN arc given in terms of the Marcum 
Q-function. Otherwise the approach is essentially the same as in the coherent case, and leads to BER expressions 
involving a single finite -range integral that can be readily evaluated numerically. More details on this approach can 
be found in [12] and [10]. 
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Chapter 7 Problems 



1. Find the outage probability of QPSK modulation at P s = 10 :i for a Rayleigh fading channel with SC 
diversity for M = 1 (no diversity), M = 2, and M = 3. Assume branch SNRs 7 , = 10 dB, 7 2 = 15 dB, 
and 73 = 20 dB. 

2. Plot the pdf p 7 E ( 7 ) given by (7.9) for the selection-combiner SNR in Rayleigh fading with M branch di- 
versity assuming M = 1,2,4, 8 , and 10. Assume each branch has average SNR of 10 dB. Your plot should 
be linear on both axes and should focus on the range of linear 7 values 0 < 7 < 60. Discuss how the pdf 
changes with increasing M and why that leads to lower probability of error. 

3. Derive the average probability of bit error for DPSK under SC with i.i.d. Rayleigh fading on each branch as 
given by (7.11). 

4. Derive a general expression for the CDF of the SSC output SNR for branch statistics that are not i.i.d. and 
show that it reduces to (7. 12) for i.i.d. branch statistics. Evaluate your expression assuming Rayleigh fading 
in each branch with different average SNRs 7i and 72 - 

5. Derive the average probability of bit error for DPSK under SSC with i.i.d. Rayleigh fading on each branch 
as given by (7.16). 

6 . Compare the average probability of bit error for DPSK under no diversity, SC, and SSC, assuming i.i.d. 
Rayleigh fading on each branch and an average branch SNR of 10 dB and of 20 dB. How does the relative 
performance change as the branch SNR increases. 

7. Plot the average probability of bit error for DPSK under SSC with M = 2, 3, and 4, assuming i.i.d. Rayleigh 
fading on each branch and an average branch SNR ranging from 0 to 20 dB. 

8 . Show that the weights a, that maximize 7 s under MRC are af = rf/N for N the common noise power on 
each branch. Also show that with these weights, 7 s = Yli 7 '%• 

9. This problem illustrates that you can get performance gains from diversity combining even without fading, 

due to noise averaging. Consider an AWGN channel with N branch diversity combining and 7 * = 10 dB 
per branch. Assume M QAM modulation with M = 4 and use the approximation Ft, = .2e‘ 0 f or 

bit error probability, where 7 is the received SNR. 

(a) Find Pi, for N = 1. 

(b) Find N so that under MRC, Pi, < 10 -6 . 

10. Derive the average probability of bit error for BPSK under MRC with i.i.d. Rayleigh fading on each branch 
as given by (7.20). 

11. Derive the average probability of bit error for BPSK under EGC with i.i.d. Rayleigh fading on each branch 
as given by (7.28). 

12. Compare the average probability of bit error for BPSK modulation under no diversity, two-branch SC, two- 
branch SSC, two-branch EGC, and two-branch MRC. Assume i.i.d. Rayleigh fading on each branch with 
equal branch SNR of 10 dB and of 20 dB. How does the relative performance change as the branch SNR 
increases. 
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13. Plot the average probability of bit error for BPSK under both MRC and EGC assuming two-branch diversity 
with i.i.d. Rayleigh fading on each branch and average branch SNR ranging from 0 to 20 dB. What is the 
maximum dB penalty of EGC as compared to MRC? 

14. Compare the outage probability of BPSK modulation at P j, = 10 -3 under MRC and under EGC assuming 
two-branch diversity with i.i.d. Rayleigh fading on each branch and average branch SNR 7=10 dB. 

15. Compare the average probability of bit error for BPSK under MRC and under EGC assuming two-branch 
diversity with i.i.d. Rayleigh fading on each branch and average branch SNR 7=10 dB. 

16. Compute the average BER of a channel with two-branch transmit diversity under the Alamouti scheme, 
assuming the branch SNR is 10 dB. 

17. Consider a fading distribution p(y) where f^° p( r y)e~ X7 d r y = .OI 7 /y/x. Find the average Pb for a BPSK 
modulated signal where the receiver has 2-branch diversity with MRC combining, and each branch has an 
average SNR of 10 dB and experiences independent fading with distribution ^( 7 ). 

18. Consider a fading channel with BPSK modulation, 3 branch diversity with MRC, where each branch experi- 
ences independent fading with an average received SNR of 15 dB. Compute the average BER of this channel 
for Rayleigh fading and for Nakagami fading with m = 2 (Using the alternate Q function representation 
greatly simplifies this computation, at least for Nakagami fading). 

19. Plot the average probability of error as a function of branch SNR for a two branch MRC system with BPSK 
modulation, where the first branch has Rayleigh fading and the second branch has Nakagami-m fading with 
m=2. Assume the two branches have the same average SNR, and your plots should have that average branch 
SNR ranging from 5 to 20 dB. 

20. Plot the average probability of error as a function of branch SNR for an M -branch MRC system with 8 PSK 
modulation for M = 1,2, 4, 8 . Assume each branch has Rayleigh fading with the same average SNR. Your 
plots should have an SNR that ranges from 5 to 20 dB. 

21. Derive the average probability of symbol error for MQAM modulation under MRC diversity given by (7.53) 
from the probability of error in AWGN (6.80) by utilizing the alternate representation of Q and Q 2 , 

22. Compare the average probability of symbol error for 16PSK and 16QAM modulation, assuming three-branch 
MRC diversity with Rayleigh fading on the first branch and Ricean fading on the second and third branches 
with K = 2. Assume equal average branch SNRs of 10 dB. 

23. Plot the average probability of error as a function of branch SNR for an M -branch MRC system with 16QAM 
modulation for M = 1,2, 4, 8 . Assume each branch has Rayleigh fading with the same average SNR. Your 
plots should have an SNR that ranges from 5 to 20 dB. 
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Chapter 8 

Coding for Wireless Channels 



Coding allows bit errors introduced by transmission of a modulated signal through a wireless channel to be either 
detected or corrected by a decoder in the receiver. Coding can be considered as the embedding of signal constel- 
lation points in a higher dimensional signaling space than is needed for communications. By going to a higher 
dimensional space, the distance between points can be increased, which provides for better error correction and 
detection. 

In this chapter we describe codes designed for AWGN channels and for fading channels. Codes designed for 
AW GN channels do not typically work well on fading channels since they cannot correct for long error bursts that 
occur in deep fading. Codes for fading channels are mainly based on using an AWGN channel code combined with 
interleaving, but the criterion for the code design changes to provide fading diversity. Other coding techniques to 
combat performance degradation due to fading include unequal error protection codes and joint source and channel 
coding. 

We first provide an overview of code design in both fading and AW GN, along with basic design parameters 
such as minimum distance, coding gain, bandwidth expansion, and diversity order. Sections 8.2-8. 3 provide a 
basic overview of block and convolutional code designs for AWGN channels. While these designs arc not directly 
applicable to fading channels, codes for fading channels and other codes used in wireless systems (e.g. spreading 
codes in CDMA) require background in these fundamental techniques. Concatenated codes and their evolution 
to turbo and low density parity check codes for AWGN channels arc also described. These extremely powerful 
codes exhibit near-capacity performance with reasonable complexity levels. Coded modulation was invented in 
the late 1970s as a technique to obtain error correction through a joint design of the modulation and coding. We 
will discuss the basic design principles behind trellis and more general lattice codes along with their performance 
in AWGN. 

Code designs for fading channels arc covered in Section 8.8. These designs combine block or convolutional 
codes with interleaving, and modify the code design to provide maximum fading diversity. Diversity gains can also 
be obtained by combining coded modulation with symbol or bit interleaving, although bit interleaving generally 
provides much higher diversity gain. Thus, coding combined with interleaving provides diversity gain in the same 
manner as other forms of diversity, with the diversity order built into the code design. Unequal error protection is 
an alternative to diversity in fading mitigation. In these codes bits are prioritized, and high priority bits are encoded 
with stronger error protection against deep fades. Since bit priorities arc part of the source code design, unequal 
error protection is a special case of joint source and channel coding, which we also describe. 

Coding is a very broad and deep subject, with many excellent books devoted solely to this topic. This chapter 
assumes no background in coding, and thus provides an in-depth discussion of code designs for AWGN channels 
before designs for wireless systems can be treated. This in-depth discussion can be omitted for a more cursory 
treatment of coding for wireless channels by focusing on Sections 8.1 and 8.8. 
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8.1 Overview of Code Design 



The main reason to apply error correction coding in a wireless system is to reduce the probability of bit or block 
error. The bit error probability P/, for a coded system is the probability that a bit is decoded in error. The block 
error probability Pbu also called the packet error rate, is the probability that one or bits in a block of coded bits is 
decoded in error. Block error probability is useful for packet data systems where bits arc encoded and transmitted 
in blocks. The amount of error reduction provided by a given code is typically characterized by its coding gain in 
AWGN and its diversity gain in fading. 

Coding gain in AWGN is defined as the amount that the SNR can be reduced under the coding technique 
for a given Pf, or fy. We illustrate coding gain for Pi, in Figure 8.1. We see in this figure that the gain C g \ at 
Pb = 10~ 4 is less than the gain C g 2 at Pb = 10 -6 , and there is negligible coding gain at Pb = ICG 2 . In fact 
codes designed for high SNR channels can have negative coding gain at low SNRs, since the extra redundancy 
required in the code does not provide sufficient performance gain at low SNRs to yield a positive coding gain. 
Thus, unexpected fluctuations in channel SNR can significantly degrade code performance. Negative coding gain 
can be avoided with systematic code designs, which have positive gain at all SNRs. The coding gain in AWGN is 
generally a function of the minimum Euclidean distance of the code, which equals the minimum distance in signal 
space between codewords or error events. Thus, codes designed for AWGN channels maximize their Euclidean 
distance for good performance. 

Error probability with or without coding tends to fall off with SNR as a waterfall shape at low to moderate 
SNRS. While this waterfall shape holds at all SNRs for uncoded systems, coded systems exhibit error floors as 
SNR grows. The error floor, also shown in Figure 8.1, kicks in at a threshold SNR which depends on the code 
design. For SNRs above this threshold, error probability falls off much more slowly, due to the fact that minimum 
distance error events eventually dominate code performance in this SNR regime. 

For many codes, the error correction capability of a code does not come for free. This performance enhance- 
ment is paid for by increased complexity and, for block codes, convolutional codes, turbo codes, and LDPC codes, 
by either a decreased data rate or increase in signal bandwidth. Consider a code with n coded bits for every k 
uncoded bits. This code effectively embeds a fc-dimensional subspace into a larger n-dimensional space to provide 
larger distances between coded symbols. However, if the data rate through the channel is fixed at II i„ then the 
information rate for a code that uses n coded bits for every k uncoded bits is -Rb, i.e. coding decreases the data 
rate by the fraction k In. We can keep the information rate constant and introduce coding gain by decreasing the 
bit time by k/n. This typically results in an expanded bandwidth of the transmittted signal by n/k. Coded modu- 
lation uses a joint design of the code and modulation to obtain coding gain without this bandwidth expansion, as 
discussed in more detail in Section 8.7. 

Codes designed for AW GN channels do not generally work well in fading due to bursts of errors that cannot 
be corrected for. However, good performance in fading can be obtained by combining AWGN channel codes with 
interleaving, and designing the code to optimize its inherent diversity. The interleaver spreads out bursts of errors 
over time, so it provides a form of time diversity. This diversity is exploited by the inherent diversity in the code. In 
fact, codes designed in this manner exhibit si mi lar performance as MRC diversity, with diversity order equal to the 
minimum Hamming distance of the code. Hamming distance is the number of coded symbols that differ between 
different codewords or error events. Thus, coding and interleaving designed for fading channels maximize their 
Hamming distance for good performance. 

8.2 Linear Block Codes 

Lineal - block codes are conceptually simple codes that are basically an extension of single-bit parity check codes 
for error detection. A single-bit parity check code is one of the most common forms of detecting transmission 
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errors. This code uses one extra bit in a block of n data bits to indicate whether the number of Is in a block is 
odd or even. Thus, if a single error occurs, either the parity bit is corrupted or the number of detected Is in the 
information bit sequence will be different from the number used to compute the parity bit: in either case the parity 
bit will not correspond to the number of detected Is in the information bit sequence, so the single error is detected. 
Lineal - block codes extend this notion by using a larger number of parity bits to either detect more than one error 
or correct for one or more errors. Unfortunately linear block codes, along with convolutional codes, trade their 
error detection or correction capability for either bandwidth expansion or a lower data rate, as will be discussed 
in more detail below. We will restrict our attention to binary codes, where both the original information and the 
corresponding code consist of bits taking a value of either 0 or 1 . 

8.2.1 Binary Linear Block Codes 

A binary block code generates a block of n coded bits from k information bits. We call this an (n, k ) binary block 
code. The coded bits are also called codeword symbols. The n codeword symbols can take on 2" possible values 
corresponding to all possible combinations of the n binary bits. We select 2 k codewords from these 2 n possibilities 
to form the code, such that each k bit information block is uniquely mapped to one of these 2 k codewords. The 
rate of the code is R c = k/n information bits per codeword symbol. If we assume that codeword symbols are 
transmitted across the channel at a rate of R s symbols/second, then the information rate associated with an (n, k) 
block code is R), = R C R S = -R s bits/second. Thus we see that block coding reduces the data rate compared to 
what we obtain with uncoded modulation by the code rate R c . 

A block code is called a linear code when the mapping of the k information bits to the n codeword symbols 
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is a lineal - mapping. In order to describe this mapping and the corresponding encoding and decoding functions in 
more detail, we must first discuss properties of the vector space of binary n-tuples and its corresponding subspaces. 
The set of all binary n-tuples B n is a vector space over the binary field, which consists of the two elements 0 and 
1. All fields have two operations, addition and multiplication: for the binary field these operations correspond 
to binary addition (modulo 2 addition) and standard multiplication. A subset S of B n is called a subspace if it 
satisfies the following conditions: 

1 . The all-zero vector is in S. 

2. The set S is closed under addition, such that if S, £ S and Sj £ S, then S) + Sj £ S. 

An (n, k) block code is linear if the 2 fc length-n codewords of the code form a subspace of B n . Thus, if Ci and Cj 
are two codewords in an (n, k) linear block code, then Ci + Cj must form another codeword of the code. 



Example 8.1: The vector space B 3 consists of all binary tuples of length 3: 

B 3 = {[000], [001], [010], [Oil], [100], [101], [110], [111]}. 

Note that /i.> is a subspace of itself, since it contains the all zero vector and is closed under addition. Determine 
which of the following subsets of B 3 form a subspace: 

• A\ = {[000], [001], [100], [101]} 

• A 2 = {[000], [100], [110], [111]} 

. a 3 = {[001], [100], [101]} 

Solution: It is easily verified that A 1 is a subspace, since it contains the all-zero vector and the sum of any two 
tuples in A\ is also in A\. A 2 is not a subspace since it is not closed under addition, as 110 + 111 = 001 0 A 2 . 
A3 is not a subspace since it is not closed under addition (001 + 001 = 000 0 A3) and it does not contain the all 
zero vector. 



Intuitively, the greater the distance between codewords in a given code, the less chance that errors introduced 
by the channel will cause a transmitted codeword to be decoded as a different codeword. We define the Hamming 
distance between two codewords Ci and Cj, denoted as d(C;. Cj) or dij, as the number of elements in which they 
differ: 

n 

dij = J2 Ci(0 + Cj(0, (8A) 

1=1 

where C m (Z) denotes the fill bit in C m (Z). For example, if Ci = [00101] and Cj = [10011] then d, VJ = 3. We 
define the weight of a given codeword Ci as the number of Is in the codeword, so Ci = [00101] has weight 2. 
The weight of a given codeword Ci is just its Hamming distance doi with the all zero codeword Co = [00 ... 0] 
or, equivalently, the sum of its elements: 

n 

w(Ci) = ^Ci (/). (8.2) 

1=1 

Since 0 + 0 = 1 + 1 = 0, the Hamming distance between Ci and Cj is equal to the weight of Ci + Cj. For 
example, with Ci = [00101] and Cj = [10011] as given above, iu(Ci) = 2, w( Cj) = 3, and d t j = 7 ij(C; + Cj) = 
u>([10110]) = 3. Since the Hamming distance between any two codewords equals the weight of their sum, we 
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can determine the minimum distance between all codewords in a code by just looking at the minimum distance 
between all codewords and the all zero codeword. Thus, we define the minimum distance of a code as 

d mm = min d 0i , (8.3) 

0 

which implicitly defines Co as the all-zero codeword. We will see in Section 8.2.6 that the minimum distance of a 
lineal - block code is a critical parameter in determining its probability of error. 



8.2.2 Generator Matrix 



The generator matrix is a compact description of how codewords are generated from information bits in a linear 
block code. The design goal in linear block codes is to find generator matrices such that their corresponding codes 
are easy to encode and decode yet have powerful error correction/detection capabilities. Consider an (n, k) code 
with k information bits denoted as 

Ui — [uii , ■ ■ ■ , rtjfc] 



that are encoded into the codeword 



C i — [Cil, • ■ • , Cin\ . 



We represent the encoding operation as a set of n equations defined by 



Cij — ^ ;1 9 1 j 4~ Ui292j T • • • T Uik9kji j — 1, . . . , 71, 



(8.4) 



where <i K] is binary (0 or 1) and binary (standard) multiplication is used. We can write these n equations in matrix 
form as 



Ci = U;G, 

where the k x n generator matrix G for the code is defined as 
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(8.5) 



(8.6) 



If we denote the Ith row of G as gi = \<n \ . . . . , gi n ] then we can write any codeword Ci as linear combinations of 
these row vectors as follows: 

Ci = Ttjlgl + Ui-2g2 + ■ • • + Uikgk- (8.7) 

Since a linear (n, k) block code is a subspace of dimension k in the larger n-dimensional space, the k row vectors 
{gi}f =1 of G must be linearly independent, so that they span the /. -dimensional subspace associated with the 2 k 
codewords. Hence, G has rank k. Since the set of basis vectors for this subspace is not unique, the generator 
matrix is also not unique. 

A systematic linear block code is described by a generator matrix of the form 





' i 


0 .. 


. . 0 


Pll 


P12 ■ 


1 

Cs? 

1 


0 

II 

'"hH 

II 


0 


1 ., 


. . 0 


P21 


P22 ■ 


• • P2(n—k) 




0 


0 . . 


.. 1 


Pkl 


Pk2 ■ 


Pk(n—k) 



217 




where Ik is a k x k identity matrix and P is a k x (n — k) matrix that determines the redundant, or parity, bits to 
be used for error correction or detection. The codeword output from a systematic encoder is of the form 

C; U;G U; [Ik P] [ttil , ■ ■ ■ , 'Ujfc; pi , • • • j P(n— fc)] (8-9) 

where the first k bits of the codeword arc the original information bits and the last (n — k) bits of the codeword arc 
the parity bits obtained from the information bits as 

pj = unpij + . . . + UikPkj , j = 1, • • • , n - k. (8.10) 

Note that any generator matrix for an (n, k) linear block code can be reduced by row operations and column 
permutations to a generator matrix in systematic form. 



Example 8.2: Systematic linear block codes arc typically implemented with n — k modulo-2 adders tied to the 
appropriate stages of a shift register. The resulting parity bits are appended to the end of the information bits to 
form the codeword. Find the corresponding implementation for generating a (7, 4) binary code with the generator 
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( 8 . 11 ) 



Solution: The matrix G is already in systematic form with 



P = 



1 1 0 
1 0 1 
0 0 1 
0 1 0 



( 8 . 12 ) 



Let Pij denote the Ij th element of P. From (8.10), we see that the first parity bit in the codeword is p \ = 
unPii + u i2 P 2 i + u i3 P 3 i + Ui^Pn = un + Ui 2 - Similarly, the second parity bit is p 2 = u t \P \2 + VH 2 P 22 + 
UizPz 2 + uaP42 = Un+Ui4 and the third parity bit is p 3 = u, \Pv.i + uaPa + u t :iP:a + +u t :iP43 = u^ + u^. The 
shift register implementation to generate these parity bits is shown in the following figure. The codeword output 
is [uiiUiiUiiUnpip 2 P^\, where the switch is in the down position to output the systematic bits u %3 , j = 1, . . . , 4 of 
the code, and in the up position to output the parity bits pj,j = 1, 2, 3 of the code. 







Figure 8.2: Implementation of (7,4) binary code. 
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8.2.3 Parity Check Matrix and Syndrome Testing 

The parity check matrix is used to decode linear block codes with generator matrix G. The parity check matrix H 
corresponding to a generator matrix G = [Ik|P] is defined as 

H = [P T |I n - k ]. (8.13) 

It is easily verified that GH 7 = Ok, n -k. where 0k, n k denotes an all zero k x (n — k ) matrix. Recall that a given 
codeword Ci in the code is obtained by multiplication of the information bit sequence U i by the generator matrix 
G: C; = UiG. Thus, 

C;H t = U;GH t = 0 n _ k (8.14) 

for any input sequence Ui, where 0 n _k denotes the all-zero row vector of length n — k. Thus, multiplication of 
any valid codeword with the parity check matrix results in an all zero vector. This property is used to determine 
whether the received vector is a valid codeword or has been corrupted, based on the notion of syndrome testing, 
which we now define. 

Let R be the received codeword resulting from transmission of codeword C. In the absence of channel errors, 
R = C. However, if the transmission is corrupted, one or more of the codeword symbols in R will differ from 
those in C. We therefore write the received codeword as 

R = C + e, (8.15) 

where e = [eie 2 . . . e n ] is the error vector indicating which codeword symbols were corrupted by the channel. We 
define the syndrome of R as 

S = RH t . (8.16) 

If R is a valid codeword, i.e. R = Ci for some i, then S = C;H r = 0 n _k by (8.14). Thus, the syndrome 
equals the all zero vector if the transmitted codeword is not corrupted, or is corrupted in a manner such that the 
received codeword is a valid codeword in the code that is different from the transmitted codeword. If the received 
codeword R contains detectable errors, then S / 0 n _k- If the received codeword contains correctable errors, 
then the syndrome identifies the error pattern corrupting the transmitted codeword, and these errors can then be 
corrected. Note that the syndrome is a function only of the error pattern e and not the transmitted codeword C, 
since 

S = RH t = (C + e)H T = CH t + eH T = 0 n _ k + eH T . (8.17) 

Since S = eH T corresponds to n — k equations in n unknowns, there are 2 k possible error patterns that can 
produce a given syndrome S. However, since the probability of bit error is typically small and independent for 
each bit, the most likely error pattern is the one with minimal weight, corresponding to the least number of errors 
introduced in the channel. Thus, if an error pattern e is the most likely error associated with a given syndrome S, 
the transmitted codeword is typically decoded as 

C = R + e = C + e + e. (8.18) 

When the most likely error pattern does occur, i.e. e = e, then C = C, i.e. the corrupted codeword is correctly 
decoded. The decoding process and associated error probability will be covered in Section 8.2.6. 

Let C w denote a codeword in a given (n, k) code with minimum weight (excluding the all-zero codeword). 
Then C W H T = 0 n _/. is just the sum of d m in columns of H r , since d m i n equals the number of Is (the weight) in 
the minimum weight codeword of the code. Since the rank of H 7 is at most n — k, this implies that the minimum 
distance of an (n, k) block code is upperbounded by 

dmin < n - k + 1. (8.19) 



219 




8.2.4 Cyclic Codes 



Cyclic codes arc a subclass of linear block codes where all codewords in a given code arc cyclic shifts of one 
another. Specifically, if the codeword C = (coci . . . c n _ i) is a codeword in a given code, then a cyclic shift by 
1, denoted as C [Vl and equal to C fli = (c n _ico . . . c n - 2 ) is also a codeword. More generally, any cyclic shift 
<7« = (c n -iC n -i- |_i . . . c n -i- 1 ) is also a codeword. The cyclic nature of cyclic codes creates a nice structure 
that allows their encoding and decoding functions to be of much lower complexity than the matrix multiplications 
associated with encoding and decoding for general linear block codes. Thus, most linear block codes used in 
practice are cyclic codes. 

Cyclic codes are generated via a generator polynomial instead of a generator matrix. The generator polyno- 
mial g(X) for an (n, k) cyclic code has degree n — k and is of the form 

g(X) = go + gi X + ... + g n - k X n ~\ (8.20) 

where gi is binary (0 or 1) and go = .(/«-/,- = 1. The A: -bit information sequence (uq . . . u k ~ 1 ) is also written in 
polynomial form as the message polynomial 

u(X) = uo + u\X + . . . + u k - (8.21) 

The codeword associated with a given A -bit information sequence is obtained from the polynomial coefficients of 
the generator polynomial times the message polynomial, i.e. the codeword C = (c 0 . . . c n _ 1 ) is obtained from 

c(X) = u(X)g(X) = c 0 + Cl X + ... + c n _ 1 X n ~ 1 . (8.22) 

A codeword described by a polynomial c(X) is a valid codeword for a cyclic code with generator polynomial g(X) 
if and only if g(X) divides c(X) with no remainder (no remainder polynomial terms), i.e. 

W) = q{x) (8 ' 23) 

for a polynomial q(X) of degree less than k. 



Example 8.3: 

Consider a (7,4) cyclic code with generator polynomial g(X) = 1 + X 2 + X 3 . Determine if the codewords 
described by polynomials ci(X) = 1 + X 2 + X 5 + X 6 and C 2 (X) = 1 + X 2 + +X 3 + X 5 + X 6 are valid 
codewords for this generator polynomial. 

Solution: Division of binary polynomials is si mi lar to division of standard polynomials except that under binary 
addition, subtraction is the same as addition. Dividing c\(X) = 1 + X 2 + X 5 + X 6 by g(X) = 1 + X 2 + X 3 , 
we have 



X 3 + 1 

X 3 + X 2 + l /x & - + X 5 + +X 3 + X 2 + 1 

X 6 + X 5 + x 3 



X 3 + x 2 + 1 
X 3 + x 2 x 1 



0 . 
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Since g(X) divides c(X) with no remainder, it is a valid codeword. In fact, we have ci(X) = (1 + X 3 )g(X) = 
u(X)g(X), so the information bit sequence corresponding to ci(X) is U = [1001] corresponding to the coeffi- 
cients of the message polynomial u(X) = 1 + X 3 . 

Dividing c 2 (X) = 1 + X 2 + X 3 + X 5 + X 6 by g(X) = 1 + X 2 + X 3 , we have 

X 3 + l 

x 3 + x 2 + i I x & - + X 5 + X 2 + 1 
X 6 + X 5 + X 3 

X 2 + 1 

where we note that there is a remainder of X 2 + 1 in the division. Thus, C 2 (X) is not a valid codeword for the 
code corresponding to this generator polynomial. 



Recall that systematic linear block codes have the first k codeword symbols equal to the information bits, and 
the remaining codeword symbols equal to the parity bits. A cyclic code can be put in systematic form by first 
multiplying the message polynomial u(X) by X n ~ k , yielding 



X n ~ k u(X) = u 0 X n ~ k + uiX n ~ k+1 + . . . + u k - iX"" 1 . 



(8.24) 



This shifts the message bits to the k rightmost digits of the codeword polynomial. 
g(X), we obtain 



X n ~ k u(X) 

9(X) 



q(X) + 



P(X) 

g(xy 



If we next divide (8.24) by 
(8.25) 



where q(X) is a polynomial of degree at most k — 1 andp(X) is a remainder polynomial of degree at most n—k—1. 
Multiplying (8.25) through by g(X) we obtain 



X n - k u(X) = q(X)g(X)+p(X). 



(8.26) 



Adding p(X) to both sides yields 

p(X) + X n - k u(X)=q(X)g(X). (8.27) 

This implies that p(X) + X n ~ k u(X) is a valid codeword since it is divisible by g(X) with no remainder. The 
codeword is described by the n coefficients of the codeword polynomial p(X) + X n ~ k u(X). Note that we can 
express p(X) (of degree n — k — 1) as 



p(X)=p 0 +p 1 X + ...p n _ k _ 1 X n - k - 1 . 



(8.28) 



Combining (8.24) and (8.28) we get 

p{X) + X n - Vx) = po + piX + . . . p n - k - iX™- 1 - 1 + u 0 X n ~ k + Ul x n ~ k+1 + . . . + Uk~\X n ~ l . (8.29) 

Thus, the codeword corresponding to this polynomial has the first k bits consisting of the message bits [uo ■ • • u k \ 
and the last n — k bits consisting of the parity bits [po • • • Pn-k- 1 ], as is required for the systematic form. 

Note that the systematic codeword polynomial is generated in three steps: first multiplying the message 
polynomial u(X) by X n ~ k , then dividing X n ~ k u(X) by g(X) to get the remainder polynomial p(X) (along 
with the quotient polynomial q(X), which is not used), and finally adding p(X) to X n ~ k u(X) to get (8.29). The 
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polynomial multiplications arc straightforward to implement, and the polynomial division is easily implemented 
with a feedback shift register [2, 1], Thus, codeword generation for systematic cyclic codes has very low cost and 
low complexity. 

Let us now consider how to characterize channel errors for cyclic codes. The codeword polynomial corre- 
sponding to a transmitted codeword is of the form 

c(X)=u(X)g(X). (8.30) 

The received codeword can also be written in polynomial form as 

r(X) = c(X) + e(X) = u(X)g(X ) + e(X) (8.31) 

where e(X) is the error polynomial of degree n — 1 with coefficients equal to 1 where errors occur. For example, if 
the transmitted codeword is C = [1011001] and the received codeword is R = [1111000] then e(X) = X+X n ~ l . 
The syndrome polynomial s(X) for the received codeword is defined as the remainder when r(X) is divided by 
g(X), so s(X) has degree n — k — 1. But by (8.31), e(X) = g(X)s(X). Therefore, the syndrome polynomial 
s(X) is equivalent to the error polynomial e(X) modulo g(X). Moreover, we obtain the syndrome through a 
division circuit similar to the one used for generating the code. As stated above, this division circuit is typically 
implemented using a feedback shift register, resulting in a low-cost low-complexity implementation. 

8.2.5 Hard Decision Decoding (HDD) 

The probability of error for linear block codes depends on whether the decoder uses soft decisions or hard decisions. 
In hard decision decoding (HDD) each coded bit is demodulated as a 0 or 1, i.e. the demodulator detects each coded 
bit (symbol) individually. For example, in BPSK, the received symbol is decoded as a 1 if it is closer to \[Et, and as 
a 0 if it is closer to — s/E^. This form of decoding removes information that can be used by the channel decoder. In 
particular, for the BPSK example the distance of the received bit from \fE\, and — \[E\, can be used in the channel 
decoder to make better decisions about the transmitted codeword. When these distances arc used in the channel 
decoder it is called soft-decision decoding. Soft decision decoding of linear block codes is treated in Section 8.2.7. 

Hai'd decision decoding uses minimum-distance decoding based on Hamming distance. In minimum-distance 
decoding the n bits corresponding to a codeword are first demodulated, and the demodulator output is passed to 
the decoder. The decoder compares this received codeword to the 2 k possible codewords comprising the code, and 
decides in favor of the codeword that is closest in Hamming distance (differs in the least number of bits) to the 
received codeword. Mathematically, for a received codeword R the decoder uses the formula 

pick Cj s.t. d(Cj, R) < d{ Cj, R)Vi + j. (8.32) 

If there is more than one codeword with the same minimum distance to R, one of these is chosen at random by the 
decoder. 

Maximum-likelihood decoding picks the transmitted codeword that has the highest probability of having 
produced the received codeword, i.e. given the received codeword R, the maximum-likelihood decoder choses the 
codeword Cj as 

Cj = argmaxp(R|Ci), i = 1, ... . , 2 k . (8.33) 

Since the most probable error event in an AWGN channel is the event with the minimum number of errors needed 
to produce the received codeword, the minimum-distance criterion (8.32) and the maximum-likelihood criterion 
(8.33) are equivalent. Once the maximum-likelihood codeword Ci is determined, it is decoded to the k bits that 
produce codeword C;. 
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Since maximum-likelihood detection of codewords is based on a distance decoding metric, we can best illus- 
trate this process in signal space, as shown in Figure 8.3. The minimum Hamming distance between codewords, 
illustrated by the black dots in this figure, is d rnin . Each codeword is centered inside a circle of radius t = | , 

where |_£j denotes the largest integer greater than or equal to x. The shaded dots represent received codewords 
where one or more bits differ from those of the transmitted codeword. The figure indicates that C i and C 2 differ 
by 3 bits. 




Figure 8.3: Maximum-Likelihood Decoding in Signal Space. 

Minimum distance decoding can be used to either detect or correct errors. Detected errors in a data block 
either cause the data to be dropped or a retransmission of the data. Error correction allows the corruption in the 
data to be reversed. For error correction the minimum distance decoding process ensures that a received codeword 
lying within a Hamming distance t from the transmitted codeword will be decoded correctly. Thus, the decoder 
can correct up to t errors, as can be seen from Figure 8.3: since received codewords corresponding to t or fewer 
errors will he within the sphere centered around the correct codeword, it will be decoded as that codeword using 
minimum distance decoding. We see from Figure 8.3 that the decoder can detect all error patterns of d m j. n — 1 
errors. In fact, a decoder for an (n, k) code can detect 2 n — 2 k possible error patterns. The reason is that there are 
2 k — 1 nondetectable errors, corresponding to the case where a corrupted codeword is exactly equal to a codeword in 
the set of possible codewords (of size 2 k ) that is not equal to the transmitted codeword. Since there are 2" — 1 total 
possible error patterns, this yields 2 n — 2 k detectable error patterns. Note that this is not hard-decision decoding, 
as we are not correcting errors, just detecting them. 
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Example 8.4: 

A (5,2) code has codewords Co = [00000], C\ = [01011], C 2 = [10101], and C3 = [11110]. Suppose die 
all zero codeword Co is transmitted. Find the set of error patterns corresponding to nondetectable errors for this 
codeword transmission. 

Solution: The nondetectable error patterns correspond to the three nonzero codewords, i.e. ei = [01011], e 2 = 
[10101], and e3 = [11110] arc nondetectable error patterns, since adding any of these to Co results in a valid 
codeword. 



8.2.6 Probability of Error for HDD in AWGN 

The probability of codeword error P e is defined as the probability that a transmitted codeword is decoded in error. 
Under hard decision decoding a received codeword may be decoded in error if it contains more than t errors (it will 
not be decoded in error if there is not alternative codeword closer to the received codeword than the transmitted 
codeword). The error probability is thus bounded above by the probability that more than t errors occur. Since the 
bit errors in a codeword occur independently on an AWGN channel, this probability is given by: 

p e < it ( ? )p i ( i -p) n - j i ( § - 34 ) 

j=t+ 1 ^ j * 

where p is the probability of error associated with transmission of the bits in the codeword. Thus, p corresponds 
to the error probability associated with uncoded modulation for the given energy per codeword symbol, as treated 
in Chapter 6 for AWGN channels. For example, if the codeword symbols are sent via coherent BPSK modulation, 
we have p = Q(\J‘IE C / No), where E c is the energy per codeword symbol and No is the noise power spectral 
density. Since there are k/n information bits per codeword symbol, the relationship between the energy per bit 
and the energy per symbol is E c = kEb/n. Thus, powerful block codes with a large number of parity bits (k/n 
small) reduce the channel energy per symbol and therefore increases the error probability in demodulating the 
codeword symbols. However, the error correction capability of these codes typically more than compensates for 
this reduction, especially at high SNRs. At low SNRs this may not happen, in which case the code exhibits 
negative coding gain, i.e. it performs worse than uncoded modulation. The bound (8.34) holds with equality 
when the decoder corrects exactly t or fewer errors in a codeword, and cannot correct for more than t errors in a 
codeword. A code with this property is called a perfect code. 

At high SNRs the most likely way to make a codeword error is to mistake a codeword for one of its nearest 
neighbors. Nearest-neighbor errors yield a pair of upper and lower bounds on error probability. The lower bound 
is the probability of mistaking a codeword for a given nearest neighbor at distance d m i n : 

dmin / T \ 

P e > ( • )P i ( 1 -p) dmin ~ j . (8.35) 

j=t + 1 ' J ' 

The upper bound, a union bound, assumes that all of the other 2 k — 1 codewords are at distance d rru „ from the 
transmitted codeword. Thus, the union bound is just 2 k — 1 times (8.35), the probability of mistaking a given 
codeword for a nearest neighbor at distance d m i n : 

dmin / 7 \ 

P e < (2 k - 1) ( dmin ) P’O- -p) dmin ~ j . (8.36) 

j=t + 1 \ J y 
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When the number of codewords is large or the SNR is low, both of these bounds arc quite loose. 

A tighter upper bound can be obtained by applying the Chernoff bound, ( P(X > x) < e x 2 ! 2 for X a 
zero-mean unit variance Gaussian random variable, to compute codeword error probability. Using this bound it 
can be shown [3] that the probability of decoding the all-zero codeword as the jth codeword with weight w j is 
upper bounded by 

P[wj) < [4p(l — p)] Wj / 2 . (8.37) 

Since the probability of decoding error is upper bounded by the probability of mistaking the all-zero codeword for 
any of the other codewords, we get the upper bound 

2 fc 

P e < j>p(l-p)P /2 . (8-38) 

3 = 2 

This bound requires the weight distribution {wj}f i for all codewords (other than the all-zero codeword corre- 
sponding to j = 1) in the code. A simpler, slightly looser upper bound is obtained from (8.38) by using d m i n 
instead of the individual codeword weights. This simplification yields the bound 

P e < (2 fc - l)[4p(l -p)] d ™i™/ 2 . (8.39) 

Note that the probability of codeword error P e depends on p, which is a function of the Euclidean distance 
between modulation points associated with the transmitted codeword symbols. In fact, the best codes for AWGN 
channels should not be based on Hamming distance: they should be based on maximizing the Euclidean distance 
between the codewords after modulation. However, this requires that the channel code be designed jointly with the 
modulation. This is the basic concept of trellis codes and turbo trellis coded modulation, which will be discussed 
in Section 8.7. However, Hamming distance is a better measure of code performance in fading when codes are 
combined with interleaving, as discussed in Section 8.8 

The probability of bit error after decoding the received codeword in general depends on the particular code 
and decoder, in particular' how bits are mapped to codewords, s im ilar to the bit mapping procedure associated with 
non-binary modulation. This bit error probability is often approximated as [1] 

" V' (1 -P)”A (8.40) 

n v 1 J 

which, for t = 1, can be simplified to [1] P b ~ p — p(l — p) n ~ 1 . 



Example 8.5: Consider a (24,12) linear block code with a minimum distance d m i n = 8 (an extended Golay code, 
discussed in Section 8.2.8, is one such code). Find P e based on the loose bound (8.39), assuming the codeword 
symbols are transmitted over the channel using BPSK modulation with Pi, /No = 10 dB. Also find P b for this code 
using the approximation P b = P,/k and compare with the bit error probability for uncoded modulation. 

Solution: For E^/Nq = 10 dB=10, we have E c /Nq = ||10 = 5. Thus, p = Q(v/l0) = 7.82 • 10 -4 . Using 
this value in (8.39) with k = 12 and d mm = 8 yields P e < 3.92 • 10 . Using the P}, approximation we get 

P b « 2p e = 3.27 • 10~ 8 . For uncoded modulation we have P b = Q(^2E b /No) = Q(V 20) = 3.87 • 10“ 6 . So we 
get over two orders of magnitude coding gain with this code. Note that the loose bound can be orders of magnitude 
away from the true error probability, as we will see in the next example, so this calculation may significantly 
underestimate the coding gain of the code. 



225 




8.2.7 Probability of Error for SDD in AWGN 



The HDD described in the previous section discards information that can reduce probability of codeword error. For 
example, in BPSK, the transmitted signal constellation is ± \[E~b and the received symbol after matched filtering is 
decoded as a 0 if it is closer to \fE\, and as a 1 if it is closer to — \[E,. Thus, the distance of the received symbol 
from \JTT\, and — \[E\, is not used in decoding, yet this information can be used to make better decisions about the 
transmitted codeword. When these distances are used in the channel decoder it is called soft-decision decoding 
(SDD), since the demodulator does not make a hard decision about whether a 0 or 1 bit was transmitted, but rather 
makes a soft decision corresponding to the distance between the received symbol and the symbol corresponding 
to a 0 or a 1 bit transmission. We now describe the basic premise of SDD for BPSK modulation: these ideas are 
easily extended to higher level modulations. 

Consider a codeword transmitted over a channel using BPSK. As in the case of HDD, the energy per codeword 
symbol is E c = - E/,. If the jth codeword symbol is a 0, it will be received as rj = \f~E r + rij and if it is a 1, it will 
be received as r 3 = —yfW c + rij, where n 3 is the AWGN noise sample of mean zero and variance Nq/2 associated 
with the receiver. In SDD, given a received codeword R = [n, . . . , r n ], the decoder forms a correlation metric 
C(R, Ci) for each codeword Ci, i = 1, . . . , 2 /,: in the code, and the decoder chooses the codeword C; with the 
highest correlation metric. The correlation metric is defined as 

n 

C'(R,C i ) = ^(2c ij -l)r i , (8.41) 

3 = 1 

where C{j denotes the jth coded bit in the codeword Ci. If c t j = 1, 2 Cy — 1 = 1 and if c l3 = 0, 2 Cy — 1 = — 1. 
So the received codeword symbol is weighted by the polarity associated with the corresponding symbol in the 
codeword for which the correlation metric is being computed. Thus, C(R. Ci) is large when most of the received 
symbols have a large magnitude and the same polarity as the corresponding symbols in C t , is smaller when most 
of the received symbols have a small magnitude and the same polarity as the corresponding symbols in Cj, and is 
typically negative when most of the received symbols have a different polarity than the corresponding symbols in 
Ci. In particular, at very high SNRs, if Ci is transmitted then 6'(R. Ci) ~ n \fE- while C(R. Cj) < ny/ET c for 

j *• 

For an AWGN channel, the probability of codeword error is the same for any codeword of a linear code. Let 
us assume the all zero codeword Ci is transmitted and the corresponding received codeword is R. To correctly 
decode R, we must have that C(R, Ci) > C(R, Ci), i = 2, . . . , 2 k . Let Wi denote the Hamming weight of the ith 
codeword Ci, which equals the number of Is in Ci. Then conditioned on the transmitted codeword Ci, G'(R. Ci) 
is Gauss-distributed with mean \/W c n(l — 2wi/n) and variance tiNq/2. Note that the correlation metrics arc not 
independent, since they arc all functions of R. The probability P e (C;) = p(C(R, Ci) < C'(R. Ci) can be shown 
to equal the probability that a Gauss-distributed random variable with variance 2w,No is less than —2w l \fEE r , i.e. 

Pe( Ci) = Q ( ^==) = Q(V^w ilb R c ). (8.42) 

V v 2 WiNo J 

Then by the union bound the probability of error is upper bounded by the sum of pairwise error probabilities 
relative to each Ci: 

2 fc 2 fc 

Pe <Y,Pe{C i) = ^2Q(y/2w i J b R c ). (8.43) 

i = 2 i = 2 

The computation of (8.43) requires the weight distribution Wi,i = 2, ... ,2^ of the code. This bound can be 
simplified by noting that Wi > d m i n , so 

Pe < (2 fc - l)Q{^2 lb R c d min ). (8.44) 
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A well-known bound on the Q function is Q(V 2x) < exp[— x\. Applying this bound to (8.43) yields 

P e < (2 fc - l)e~ lbRcd min T b^cdmin —— g d-min +fcln2 . (8.45) 

Comparing this bound with that of uncoded BPSK modulation 

Pb = Q(vWb) < e~ 7b , (8-46) 

we get a dB coding gain of approximately 

G c = 101og 10 [(7 b R c dmin ~ k In 2) /j b ] = 10 log w [R c d min ~ k\n2/^ h ]. (8.47) 

Note that the coding gain depends on the code rate, the number of information bits per codeword, the minimum 
distance of the code, and the channel SNR. In particular, the coding gain decreases with 7 &, and becomes negative 
at sufficiently low SNRs. In general the performance of SDD is about 2-3 dB better than HDD [2, Chapter 8.1]. 



Example 8.6: Find the approximate coding gain of SDD over uncoded modulation for the (24,12) code with 
dmin = 8 considered in Example 8.2.6 above, with 7 b = 10 dB. 

Solution: Setting 7 5 = 10, R c = 12/24, d m in = 8, and k = 12 in (8.47) yields G c = 5 dB. This significant coding 
gain is a direct result of the large minimum distance of the code. 



8.2.8 Common Linear Block Codes 

We now describe some common linear block codes. More details can be found in [1, 2, 4]. The most common type 
of block code is a Hamming code, which is parameterized by an integer m > 2. For an (n, k) Hamming code, 
n = 2"' — 1 and k = 2 m — m — 1, so n — k = m redundant bits are introduced by the code. The minimum 
distance of all Hamming codes is d m in = 3, so t = 1 error in the n = 2 m — 1 codeword symbols can be corrected. 
Although Hamming codes are not very powerful, they arc perfect codes, and therefore have probability of error 
given exactly by the right side of (8.34). 

Golay and extended Golay codes are another class of channel codes with good performance. The Golay code 
is a linear (23,12) code with d m in = 7 and t = 3. The extended Golay code is obtained by adding a single parity 
bit to the Golay code, resulting in a (24,12) block code with d m in = 8 and t = 3. The extra parity bit does not 
change the error correction capability since t remains the same, but it greatly simplifies implementation since the 
information bit rate is one half the coded bit rate. Thus, both uncoded and coded bit streams can be generated 
by the same clock using every other clock sample to generate the uncoded bits. These codes have higher d m i n 
and thus better error correction capabilities than Hamming codes, at a cost of more complex decoding and a lower 
code rate R c = k/n. The lower code rate implies that the code either has a lower data rate or requires additional 
bandwidth. 

Another powerful class of block codes is the Bose-Chadhuri-Hocquenghem (BCH) codes. These codes are 
cyclic codes, and at high rates typically outperform all other block codes with the same n and k at moderate to high 
SNRs. This code class provides a large selection of block lengths, code rates, and error correction capabilities. In 
particular, the most common BCH codes have n = 2 m — 1 for any integer m > 3. 

The Pf, for a number of BCH codes under hard decision decoding and coherent BPSK modulation is shown 
in Figure 8.4. The plot is based on the approximation (8.40) where, for coherent BPSK, we have 
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In this figure the BCH (127,36) code actually has a negative coding gain at low SNRs. This is not uncommon for 
powerful channel codes due to their reduced energy per symbol, as was discussed in Section 8.2.5. 




Figure 8.4: f), for different BCH codes. 



8.2.9 Nonbinary Block Codes: the Reed Solomon Code 

A nonbinary block code has similar properties as the binary code: it has K information bits mapped into codewords 
of length N. However the N codeword symbols of each codeword arc chosen from a nonbinary alphabet of size 
q > 2. Thus, the codeword symbols can take any value in {0, 1 . . . , q — 1}. Usually q = 2 k so that k information 
bits can be mapped into one codeword symbol. 

The most common nonbinary block code is the Reed Soloman (RS) code, used in a range of applications 
from magnetic recording to Cellular Digital Packet Data (CDPD). RS codes have N = q— l = 2 k — 1 and 
K = 1,2,..., TV — 1. The value of K dictates the error correction capability of the code. Specifically, a RS code 
can correct up to t = .o(N — I\) codeword symbol errors. In nonbinary codes the minimum distance between 
codewords is defined as the number of codeword symbols in which the codewords differ. RS codes achieve a 
minimum distance of d m i n = N — K + 1, which is the largest possible minimum distance between codewords for 
any linear code with the same encoder input and output block lengths. 

Since nonbinary codes, and RS codes in particular, generate symbols corresponding to 2 k bits, they arc some- 
times used for M- ary modulation techniques for M = 2 k . In particular - , with 2^-ary modulation each codeword 
symbol is transmitted over the channel as one of 2 k possible constellation points. If the error probability associated 
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with the modulation (the probability of mistaking the received constellation point for a constellation point other 
than the transmitted point) is P\j, then the probability of symbol error associated with the nonbinary code is upper 
bounded by 

N , , 

A < E ( f ) P m { 1 ~ p M) N ~ j , (8-49) 

j=t + 1 V J ' 

similar to the form for the binary code (8.34). The probability of bit error is then 

A = (8-50) 



8.3 Convolutional Codes 

A convolutional code generates coded symbols by passing the information bits through a linear finite-state shift 
register, as shown in Figure 8.5. The shift register consists of K stages with k bits per stage. There are n binary 
addition operators with inputs taken from all K stages: these operators produce a codeword of length n for each k 
bit input sequence. Specifically, the binary input data is shifted into each stage of the shift register k bits at a time, 
and each of these shifts produces a coded sequence of length n. The rate of the code is R c = k/n. The number 
of shift register stages K is called the constraint length of the code. It is clear from Figure 8.5 that a length-/! 
codeword depends on kK input bits, in contrast to a block code which only depends on k input bits. Convolutional 
codes are said to have memory since the current codeword depends on more input bits {kK) than the number input 
to the encoder to generate it ( k ). 



length-n codeword 




Stage 1 Stage 2 Stage K 



Figure 8.5: Convolutional Encoder. 



8.3.1 Code Characterization: Trellis Diagrams 

When a length-n codeword is generated by a convolutional encoder, this codeword depends both on the k bits input 
to the first stage of the shift register as well as the state of the encoder, defined as the contents in the other K — 1 
stages of the shift register. In order to characterize a convolutional code, we must characterize how the codeword 
generation depends both on the k input bits and the encoder state, which has 2 h ~ 1 possible values. There arc 
multiple ways to characterize convolutional codes, including a tree diagram, state diagram, and trellis diagram [2] . 
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The tree diagram represents the encoder in the form of a tree where each branch represents a different encoder 
state and the corresponding encoder output. A state diagram is a graph showing the different states of the encoder 
and the possible state transitions and corresponding encoder outputs. A trellis diagram uses the fact that the tree 
representation repeats itself once the number of stages in the tree exceeds the constraint length of the code. The 
trellis diagram simplifies the tree representation by merging nodes in the tree corresponding to the same encoder 
state. In this section we will focus on the trellis representation of a convolutional code since it is the most common 
characterization. The details of the trellis diagram representation are best described by an example. 

Consider the convolutional encoder shown in Figure 8.6 with n = 3, k = 1, and K = 3. In this encoder, 
one bit at a time is shifted into Stage 1 of the 3-stage shift register. At a given time t we denote the bit in Stage i 
of the shift register as S'*. The 3 stages of the shift register are used to generate a codeword of length 3, C \C‘>C : >, 
where from the figure we see that C] = .S'] + S 2 , 62 = .S'i + S 2 + S3, and C3 = S3. A bit sequence U shifted 
into the encoder generates a sequence of coded symbols, which we denote by C. Note that the coded symbols 
corresponding to C3 are just the original information bits. As with block codes, when one of the coded symbols 
in a convolutional code corresponds to the original information bits, we say that the code is systematic. We define 
the encoder state as S = S2S3, i.e. the contents of the last two stages of the encoder, and there are 2 2 = 4 possible 
values for this encoder state. To characterize the encoder, we must show for each input bit and each possible 
encoder state what the encoder output will be, and how the new input bit changes the encoder state for the next 
input bit. 




Stage 1 Stage 2 Stage 3 



Figure 8.6: Convolutional Encoder Example, (n = 3, k = 1, K = 3). 

The trellis diagram for this code is shown in Figure 8.7. The solid lines in Figure 8.7 indicate the encoder 
state transition when a 0 bit is input to Stage 1 of the encoder, and the dashed lines indicate the state transition 
corresponding to a 1 bit input. For example, starting at state S = 00, if a 0 bit is input to Stage 1 then, when the 
shift register transitions, the new state will remain as S = 00 (since the 0 in Stage 1 transitions to Stage 2, and 
the 0 in Stage 2 transitions to Stage 3, resulting in the new state S = S 2 S 3 = 00). On the other hand, if a 1 bit is 
input to Stage 1 then, when the shift register transitions, the new state will become S = 10 (since the 1 in Stage 1 
transitions to Stage 2, and the 0 in Stage 2 transitions to Stage 3, resulting in the new state S = S 2 S 3 = 10). The 
encoder output corresponding to a particular encoder state S and input ,Sj is written next to the transition lines in 
Figure 8.7. This output is the encoder output that results from the encoder addition operations on the bits Si, S 2 
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and S3 in each stage of the encoder. For example, if S = 00 and Si = 1 then the encoder output C\ C2C3 has 
Ci = S\ + S '2 = 1, 62 = .S ’ 1 + .S '2 + S3 = 1, and C3 = S3 = 0. This output 110 is drawn next to the dashed 
line transitioning from state S = 00 to state S = 10 in Figure 8.7. Note that the encoder output for Si = 0 and 
S = 00 is always the all-zero codeword regardless of the addition operations that form the codeword C1C2C3, 
since summing together any number of 0s always yields 0. The portion of the trellis between time £, and £ l+ 1 is 
called the ith branch of the trellis. Figure 8.7 indicates that the initial state at time to is the all-zero state. The trellis 
achieves steady state, defined as the point where all states can be entered from either of two preceding states, at 
time £3. After this steady state is reached, the trellis repeats itself in each time interval. Note also that in steady 
state each state transitions to one of two possible new states. In general trellis structures starting from the all-zero 
state at time 1 0 achieve steady-state at time tj{- 



S-S 2 S 3 




110 110 110 




Figure 8.7: Trellis Diagram 

For general values of k and K , the trellis diagram will have 2 A 1 states, where each state has 2 k paths entering 
each node, and 2 k paths leaving each node. Thus, the number of paths through the trellis grows exponentially with 
k, K, and the length of the trellis path. 



Example 8.7: Consider the convolution code represented by the trellis in Figure 8.7. For an initial state S = 
S’2 S3 = 01, find the state sequence S and the encoder output C for input bit sequence U = Oil. 

Solution: The first occurence of S = 01 in the trellis is at time £ 2 . We see at £ 2 that if the information bit 
Si = 0 we follow the solid line in the trellis from S = 01 at £2 to S = 00 at £3, and the output correspond- 
ing to this path through the trellis is C = Oil. Now at £3, starting at S = 00, for the information bit Si = 1 
we follow the dashed line in the trellis to S = 10 at £4, and the output corresponding to this path through the 
trellis is C = 111. Finally, at £4, starting at S = 10, for the information bit Si = 1 we follow the dashed 
line in the trellis to S = 11 at £5, and the output corresponding to this path through the trellis is C = 101. 
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8.3.2 Maximum Likelihood Decoding 

The convolutional code generated by the finite state shift register is basically a finite state machine. Thus, unlike 
an (n, k) block code, where maximum likelihood detection entails finding the length-// codeword that is closest to 
the received length-n codeword, maximum likelihood detection of a convolutional code entails finding the most 
likely sequence of coded symbols C given the received sequence of coded symbols, which we denote by R. In 
particular, for a received sequence R, the decoder decides that coded symbol sequence C * was transmitted if 

p(R|C*) > p(R|C) VC. (8.51) 

Since each possible sequence C corresponds to one path through the trellis diagram of the code, maximum like- 
lihood decoding corresponds to finding the maximum likelihood path through the trellis diagram. For an AWGN 
channel, noise affects each coded symbol independently. Thus, for a convolutional code of rate 1/n, we can express 
the likelihood (8.51) as 

OO oo n 

P(R|C) = HpiRilCi) = nn piRijlCij), (8.52) 

2=0 2=0 j = l 

where 6', is the portion of the code sequence C corresponding to the ith branch of the trellis, R , is the portion of the 
received code sequence R corresponding to the ith branch of the trellis, is the jth coded symbol corresponding 
to Ci and R, t j is the jth received coded symbol corresponding to The log likelihood function is defined as the 
log of p(R|C), given as 



logp(R|C) = ^log p(Ri\Ci) = EE log p(R ij \C ij ). (8.53) 

2=0 2=0 j = 1 



The expression 

n 

Bi = 'Y^\ogp(R ij \C ij ) (8.54) 

5 = i 

is called the branch metric since it indicates the component of (8.53) associated with the ith branch of the trellis. 
The sequence or path that maximizes the likelihood function also maximizes the log likelihood function since the 
log is monotonically increasing. However, it is computationally more convenient for the decoder to use the log 
likelihood function since it involves a summation rather than a product. The log likelihood function associated 
with a given path through the trellis is also called the path metric which, from (8.53), is equal to the sum of branch 
metrics along each branch of the path. The path through the trellis with the maximum path metric corresponds to 
the maximum likelihood path. 

The decoder can use either hard decision or soft decision for the expressions log p( R r] | G' i? ) in the log like- 
lihood metric. For hard decision decoding, the R, t j is decoded as a 1 or a 0. The probability of hard decision 
decoding error depends on the modulation and is denoted as p. If R and C arc L bits long and differ in d places 
(i.e. their Hamming distance is d), then 

p(R|C) = p d {i -p) L ~ d 

and 

logp(RlC) = — dlog - — - + L log(l — p). (8.55) 

P 
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Since p < .5, (8.55) is minimized when d is minimized. So the coded sequence C with minimum Hamming 
distance to the received sequence R corresponds to the maximum likelihood sequence. 

In soft decision decoding the value of the received coded symbols (Rij) arc used directly in the decoder, 
rather than quantizing them to 1 or 0. For example, if the C l3 arc sent via BPSK over an AWGN channel then 

Rij = \pRc^Cij — 1) + riij , (8.56) 



where E c = kE^/n is the energy per coded symbol and n l3 denotes Gaussian noise of mean zero and variance 
a 2 = .5Nq. Thus, 



p(Rij\Cij) 



1 

s/2tti7 




( Rij — \fEc(‘2Cij 
2& 2 




(8.57) 



Maximizing this likelihood function is equivalent to choosing the C tJ that is closest in Euclidean distance to Rij. 
In determining which sequence C maximizes the log likelihood function (8.53), any terms that are common to 
two different sequences Ci and C2 can be neglected, since they contribute the same amount to the summation. 
Similarly, we can scale all terms in (8.53) without changing the maximizing sequence. Thus, by neglecting scaling 
factors and terms in (8.57) that are common to any C\j, we can replace J2j=i ^°ZP(Rij\Cij) in (8.53) with the 
equivalent branch metric 

n 

Pi = ^2 Rij(2Cij - 1) (8.58) 

3 = 1 



and obtain the same maximum likelihood output. 

We now illustrate the path metric computation under both hard and soft decisions for the convolutional code of 
Figure 8.6 with the trellis diagram in Figure 8.7. For simplicity, we will only consider two possible paths through 
the trellis, and compute their corresponding likelihoods for a given received sequence R. Assume we start at time 
to in the all-zero state. The first path we consider is the all-zero path, corresponding to the all-zero input sequence. 
The second path we consider starts in state S = 00 at time 1 0 and transitions to state S = 10 at time 1 1 , then to state 
S = 01 at time t. 2 , and finally to state S = 00 at time t:>„ at which point this path merges with the all-zero path. 
Since the paths and therefore their branch metrics at times t <t 0 and t > 1 3 are the same, the maximum likelihood 
path corresponds to the path whose sum of branch metrics over the branches in which the two paths differ is smaller. 
From Figure 8.7 we see that the all- zero path through the trellis generates the coded sequence C 0 = 000000000 
over the first three branches in the trellis. The second path generates the coded sequence Ci = 110110011 over 
the first three branches in the trellis. 

Fet us first consider hard decision decoding with error probability p. Suppose the received sequence over 
these three branches is R = 100110111. Note that the Hamming distance between R and Co is 6 while the 
Hamming distance between R and Ci is 2. As discussed above, the most likely path therefore corresponds to Ci 
since it has minimum Hamming distance to R. The path metric for the all- zero path is 



2 3 

Mq = EE log P(Rij\Cij) = 6 logp + 3 log(l -p), (8.59) 

i=o j = 1 

while the path metric for the other path is 

2 3 

M\ = Cij) = 21ogp + 71og(l -p). (8.60) 

*= 0 j = 1 

Assuming p « 1, which is generally the case, this yields Mq ~ (i log p and M] ~ 21ogp. Since logp < 1, this 
confirms that the second path has a larger path metric than the first. 
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Let us now consider soft decision decoding over time to to £ 3 . Suppose the received sequence (before demod- 
ulation) over these three branches, for E c = 1, is Z = (. 8 , —.35, —.15, 1.35, 1.22, —.62, .87, 1.08, .91). The path 
metric for the all zero path is 

2 2 3 2 3 

= = Rij&Cij - 1 ) = E E = - 5 -n- 

2=0 2=0 j= 1 2=0 j= 1 

The path metric for the second path is 



2 3 

Ml = J2H R ij( 2 Cij - 1) = 4.91. 
i= 0 i=l 

Thus, the second path has a higher path metric than the first. In order to determine if the second path is the 
maximum-likelihood path, we must compare its path metric to that of all other paths through the trellis. 

The difficulty with maximum likelihood decoding is that the complexity of computing the log likelihood 
function (8.53) grows exponentially with the memory of the code, and this computation must be done for every 
possible path through the trellis. The Viterbi algorithm, discussed in the next section, reduces the complexity of 
maximum likelihood decoding by taking advantage of the structure of the path metric computation. 

8.3.3 The Viterbi Algorithm 

The Viterbi algorithm, discovered by Viterbi in 1967 [ 6 ] reduces the complexity of maximum likelihood decoding 
by systematically removing paths from consideration that cannot achieve the highest path metric. The basic premise 
is to look at the partial path metrics associated with all paths entering a given node (Node N ) in the trellis. Since 
the possible paths through the trellis leaving node N are the same for each entering path, the complete trellis path 
with the highest path metric that goes through Node N must coincide with the path that has the highest partial path 
metric up to node N. This is illustrated in Figure 8 . 8 , where Path 1, Path 2, and Path 3 enter Node N (at trellis 
depth n) with part ial path metrics P l = ]T ; V =(| B\, I = 1, 2, 3 up to this node. Assume P 1 is the largest of these 
partial path metrics. The complete path with the highest metric, shown in bold, has branch metrics { B /.. } after node 
N. The maximum likelihood path starting from Node N, i.e. the path starting from node N with the largest path 
metric, has partial path metric Y^k=n ^k- The complete path metric for Path 1,1 = 1, 2, or 3 up to node N and the 
maximum likelihood path after node N is P l + = 1)2,3, and thus the path with the maximum partial 

path metric P l up to node N (Path 1 in this example) must correspond to the path with the largest path metric that 
goes through node N. 

The Viterbi algorithm takes advantage of this structure by discarding all paths entering a given node except 
the path with the largest partial path metric up to that node. The path that is not discarded is called the survivor 
path. Thus, for the example of Figure 8 . 8 , Path 1 is the survivor at node N and Paths 2 and 3 arc discarded from 
further consideration. Thus, at every stage in the trellis there arc 2 h 1 surviving paths, one for each possible 
encoder state. A branch for a given stage of the trellis cannot be decoded until all surviving paths at a subsequent 
trellis stage overlap with that branch, as shown in Figure 8.9. This figure shows the surviving paths at time f fc+3- 
We see in this figure that all of these surviving paths can be traced back to a common stem from time t /• to tk+ 1- 
At this point the decoder can output the codeword C, associated with this branch of the trellis. Note that there 
is not a fixed decoding delay associated with how far back in the trellis a common stem occurs for a given set of 
surviving paths, this delay depends on k, K, and the specific code properties. To avoid a random decoding delay, 
the Viterbi algorithm is typically modified such that at a given stage in the trellis, the most likely branch n stages 
back is decided upon based on the partial path metrics up to that point. While this modification does not yield 
exact maximum likelihood decoding, for n sufficiently large (typically n > 5 K) it is a good approximation. 
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Figure 8.8: Partial Path Metrics on Maximum Likelihood Path 



The Viterbi algorithm must keep track of 2 k - k ^ surviving paths and their corresponding metrics. At each 
stage, 2 a metrics must be computed for each node to determine the surviving path, corresponding to the 2 k paths 
entering each node. Thus, the number of computations in decoding and the memory requirements for the algorithm 
increase exponentially with k and K. This implies that for practical implementations convolutional codes arc 
restricted to relatively small values of k and I\. 

8.3.4 Distance Properties 

As with block codes, the error correction capability of convolutional codes depends on the distance between code- 
word sequences. Since convolutional codes arc linear, the minimum distance between all codeword sequences can 
be found by determining the minimum distance from any sequence or equivalently any trellis path to the all-zero 
sequence/trellis path. Clearly the trellis path with minimum distance to the all-zero path will diverge and remerge 
with the all-zero path, such that the two paths coincide except over some number of trellis branches. To find this 
minimum distance path we must consider all paths that diverge from the all-zero state and then remerge with this 
state. As an example, in Figure 8.10 we draw all paths in Figure 8.7 between times to and tr, that diverge and 
remerge with the all-zero state. Note that Path 2 is identical to Path 1, just shifted in time, and therefore is not 
considered as a separate path. Note also that we could look over a longer time interval, but any paths that diverge 
and remerge over this longer interval would traverse the same branches (shifted in time) as one of these paths plus 
some additional branches, and would therefore have larger path metrics. In particular, we see that Path 4 traverses 
the same branches as Path 1, 00-10-01 and then later 01-00, plus the branches 01-10-01. Thus we need not con- 
sider a longer time interval to find the minimum distance path. For each path in Figure 8.7 we label the Hamming 
distance of the codeword on each branch to the all-zero codeword in the corresponding branch of the all-zero path. 
By summing up the Hamming distances on all branches of each path we see that Path 1 has a Hamming distance 
of 6 and Paths 3 and 4 have Hamming distances of 8. Recalling that dashed lines indicate 1 bit inputs while solid 
lines indicate 0 bit inputs, we see that Path 1 corresponds to an input bit sequence from to to tr, of 10000, Path 3 
corresponds to an input bit sequence of 11000, and Path 4 corresponds to an input bit sequence of 10100. Thus, 
Path 1 results in one bit error, relative to the all zero squence, and Paths 3 and 4 result in two bit errors. 

We define the minimum free distance df ree of a convolutional code, also called the free distance, to be the 
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Figure 8.9: Common Stem for All Survivor Paths in the Trellis 



minimum Hamming distance of all paths through the trellis to the all-zero path, which for this example is 6. The 
error correction capability of the code is obtained in the same manner as for block codes, with d m i n replaced by 
df, so that the code can correct t channels errors with 



t = l 



df — 1 



8.3.5 State Diagrams and Transfer Functions 

The transfer function of a convolutional code is used to characterize paths that diverge and remerge from the all- 
zero path, and is also used to obtain probability of error bounds. The transfer function is obtained from the code’s 
state diagram representing possible transitions from the all-zero state to the all-zero state. The state diagram for 
the code illustrated in Figure 8.7 is shown in Figure 8.1 1, with the all-zero state a = 00 split into a second node e 
to facilitate representing paths that begin and end in this state. Transitions between states due to a 0 input bit are 
represented by solid lines, while transitions due to a 1 input bit arc represented by dashed lines. The branches of 
the state diagram arc labeled as either D° = 1, D 1 , or D 2 , where the exponent of D corresponds to the Hamming 
distance between the codeword, which is shown for each branch transition, and the all-zero codeword in the all- 
zero path. The self-loop in node a can be ignored since it does not contribute to the distance properties of the 
code. 

The state diagram can be represented by state equations for each state. For the example of Figure 8.7 we 
obtain state equations corresponding to the four states: 

X c = D 3 X a + DX b , X b = DX c + DX d , X d = D 2 X C + D 2 X d , X e = D 2 X b , (8.61) 

where X a , . . . , X e are dummy variables characterizing the partial paths. The transfer function of the code, describ- 
ing the paths from state a to state e, is defined as T(D) = X e /X a . By solving the state equations for the code. 
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Path 1 and 2: 00-10-01-00 
Path 3: 00-10-11-01-00 
Path 4: 00-10-01-10-01-00 



Figure 8.10: Path Distances to the All-Zero Path 



which can be done using standard techniques such as Mason’s formula, we obtain a transfer function of the form 

OO 

T(D) = ]T a d D d , (8.62) 

d=d f 

where a d is the number of paths with Hamming distance d from the all-zero path. As stated above, the minimum 
Hamming distance to the all-zero path is d /, and the transfer function T(D) indicates that there are a df paths with 
this minimum distance. For the example of Figure 8.7, we can solve the state equations given in 8.61 to get the 
transfer function 

n6 

T(D) = 2 = D 6 + 2D 8 + 4 D w + . . . (8.63) 

We see from the transfer function that there is one path with minimum distance df = 6, and 2 paths with Hamming 
distance 8, which is consistent with Figure 8.10. The transfer function is a convenient shorthand for enumerating 
the number and corresponding Hamming distance of all paths in a particular code that diverge and later remerge 
with the all-zero path. 

While the transfer function is sufficient to capture the number and Hamming distance of paths in the trellis to 
the all-zero path, we need a more detailed characterization to compute the bit error probability of the convolutional 
code. We therefore introduce two additional parameters into the transfer function, N and J for this additional 
characterization. The factor N is introduced on all branch transitions associated with a 1 input bit (dashed lines 
in Figure 8.11). The factor J is introduced to every branch in the state diagram such that the exponent of J in 
the transfer function equals the number of branches in any given path from node a to node e. The extended state 
diagram corresponding to the trellis of Figure 8.7 is shown in Figure 8.12. 

The extended state diagram is also represented by state equations. For the example of Figure 8.12 these are 
given by: 

X c = JND 3 X a + JNDX b , X b = JDX C + JDX d , X d = JND 2 X C + JND 2 X d , X e = JD 2 X b , (8.64) 
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Figure 8.11: State Diagram 



Similar to the previous transfer function definition, the transfer function associated with this extended state is 
defined as T(D, N. J ) = X e /X a , which for this example yields 

J 3 ND 6 

T(D, N, J ) = — — ^ = J 3 ND 6 + J 4 N 2 D 8 + J 5 N 2 D 8 + J 5 N 3 D W + .... (8.65) 

1 — JND Z { 1 + J) 

The factor J is most important when we arc interested in transmitting finite length sequences: for infinite length 
sequences we typically set J = 1 to obtain the transfer function for the extended state 

T(D,N) = T(D,N,J = 1). (8.66) 

The transfer function for the extended state tells us more information about the diverging and remerging 
paths; namely, the minimum distance path with Hamming distance 6 is of length 3 and results in a single bit error 
(exponent of N is one), one path of Hamming distance 8 is of length 4 and results in 2 bit errors, and the other 
path of Hamming distance 8 is of length 5 and results in 2 bit errors, consistent with Figure 8. 10. The extended 
transfer function is a convenient shorthand to represent the Hamming distance, length, and number of bit errors 
corresponding to each diverging and remerging path of a code from the all zero path. We will see in the next section 
that this convenient representation is very useful in characterizing the probability of error for convolutional codes. 



8.3.6 Error Probability for Convolutional Codes 

Since convolutional codes arc linear codes, the probability of error can be obtained by assuming that the all-zero 
sequence is transmitted, and determining the probability that the decoder decides in favor of a different sequence. 
We will consider error probability for both hard decision and soft decision decoding. 

We first consider soft-decision decoding. We arc interested in the probability that the all-zero sequence is 
sent, but a different sequence is decoded. If the coded symbols output from the convolutional encoder are sent over 
an AWGN channel using coherent BPSK modulation with energy E c = R c Eb , then it can be shown that if the 
all-zero sequence is transmitted, the probability of mistaking this sequence with a sequence Hamming distance d 
away is [2] 

Pi(d) = Q = Q (\/27 bRcd) . (8.67) 
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Figure 8.12: Extended State Diagram 



We call this probability the pairwise error probability, since it is the error probability associated with a pairwise 
comparison of two paths that differ in d bits. The transfer function enumerates all paths that diverge and remerge 
with the all zero path, so by the union bound we can upper bound the probability of mistaking the all-zero path for 
another path through the trellis as 

OO 

P e < ^ a,iQ (V 27 b R c d) , ( 8 . 68 ) 

d f 

where a d denotes the number of paths of distance d from the all-zero path. This bound can be expressed in terms 
of the transfer function itself if we use an exponential to upper bound the Q function, i.e. we use the fact that 

Q (y/27 b Rcd) < e~^ d . 

We then get the upper bound 

Pe < r P{D')\D= e — *b R c- (8.69) 

While this upper bound tells us the probability of mistaking one sequence for another, it does not yield the 
probability of bit error, which is more fundamental. We know that the exponent in the factor N of T(D, N) indi- 
cates the number of information bit errors associated with selecting an incorrect path through the trellis. Specifi- 
cally, we can express T(D, N) as 

OO 

T(D, N) = J2 a d D d N f ( d \ (8.70) 

d = d free 

where f(d) denotes the number of bit errors associated with a path of distance d from the all-zero path. Then we 
can upper bound the bit error probability, for k = 1, as [2] 

OO 

Pb < YMm (yP/bPcd) , (8.71) 

d f 



where the only difference with (8.68) is the weighting factor f(d) corresponding to the number of bit errors in each 
incorrect path. If the Q function is upper bounded by the complex exponential as above we get the upper bound 



Pb< 



dT(D, N) 
dN 



N=l,D=e~ib R c 



(8.72) 
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If k > 1 then we divide (8.71) or (8.72) by k to obtain P 

All of these bounds assume coherent BPSK transmission (or coherent QPSK, which is equivalent to two in- 
dependent BPSK transmissions). For other modulations, the pairwise error probability P 2 (d) must be recomputed 
based on the probability of error associated with the given modulation. 

Let us now consider hard decision decoding. The probability of selecting an incorrect path at distance d from 
the all zero path, for d odd, is given by 



W = E ( f V(l- p) [d ~ k \ (8.73) 

fc=.5(d+l) ^ ' 

where p is the probability or error on the channel. This is because the incorrect path will be selected only if the 

decoded path is closer to the incorrect path than to the all-zero path, i.e. the decoder makes at least .5(d+ 1) errors. 
If d is even, then the incorrect path is selected when the decoder makes more than .5 d errors, and the decoder 
makes a choice at random of the number of errors is exactly .5 d. We can simplify the pairwise error probability 
using the Chernoff bound to yield 

P 2 {d) < [4p(l -p)] d/2 . (8.74) 

Following the same approach as in soft decision decoding, we then obtain the error probability bound as 

OO 

P e < 5>d[4p(l ~P)} d/2 < nD)\ D=V l (8-75) 

d f 



and 



P h <Y J a df{d)P2{d) 

d f 



dT(D, N) 
dN 



N = 1 , D = y/4p(l— p) 



(8.76) 



8.4 Concatenated Codes 

A concatenated code uses two levels of coding: an inner code and an outer code, as show in Figure 8.13. The 
inner code is typically designed to remove most of the errors introduced by the channel, and the outer code is 
typically a less powerful code that further reduces error probability when the received coded bits have a relatively 
low probability of error (since most errors are corrected by the inner code). Concatenated codes may have the inner 
and outer codes separated by an interleaver to break up block errors introduced by the channel. Concatenated codes 
typically achieve very low error probability with less complexity than a single code with the same error probability 
performance. The decoding of concatenated codes is typically done in two stages, as indicated in the figure: first 
the inner code is decoded, and then the outer code is decoded separately. This is a suboptimal technique, since in 
fact both codes are working in tandem to reduce error probability. However, the ML decoder for a concatenated 
code, which performs joint decoding, is highly complex. It was discovered in the mid 1990s that a near-optimal 
decoder for concatenated codes can be obtained based on iterative decoding, and that is the basic premise behind 
turbo codes, described in the next section. 



8.5 Turbo Codes 

Turbo codes, introduced in 1993 in a landmark paper by Berrou, Glavieux, and Thitimajshima [9], arc very power- 
ful codes that can come within a fraction of a dB of the Shannon capacity limit on AWGN channels. Turbo codes 
and the more general family of codes on graphs with iterative decoding algorithms [11, 12] have been studied 
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Figure 8.13: Concatenated Coding 



extensively, yet some of their characteristics arc still not well understood. The main ideas behind codes on graphs 
were introduced by Gallager in the early sixties [10], however at the time these coding techniques were thought 
impractical and were generally not pursued by researchers in the field. The landmark 1993 paper on turbo codes 
[9] provided more than enough motivation to revisit Gallager’s and other’s work on iterative, graph-based decoding 
techniques. 

As first described by Berrou et al, turbo codes consist of two key components: parallel concatenated encoding 
and iterative, “turbo” decoding [9, 13]. A typical parallel concatenated encoder is shown in Figure 8.14. It consists 
of two parallel convolutional encoders separated by an interleaver, with the input to the channel being the data 
bits m along with the parity bits X\ and X 2 output from each of the encoders in response to input m. Since 
the m information bits are transmitted as paid of the codeword, we call this a systematic turbo code. The key to 
parallel concatenated encoding lies in the recursive nature of the encoders and the impact of the interleaver on the 
information stream. Interleavers also play a significant role in the elimination of error floors [13]. 




Figure 8.14: Parallel Concatenated (Turbo) Encoder. 
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Iterative, or “turbo” decoding exploits the component-code substructure of the turbo encoder by associating 
a component decoder with each of the component encoders. More specifically, each decoder performs soft in- 
put/soft output decoding, as shown in Figure 8.15 for the example encoder of Figure 8.14. In this figure Decoder 
1 generates a soft decision in the form of a probability measure p(m i) on the transmitted information bits based 
on the received codeword (rn, X\). The probability measure is generated by either a minimum a posteriori (MAP) 
probability algorithm or a soft output Viterbi algorithm (SOVA). This reliability information is passed to Decoder 
2, which generates its own probability measure p(m 2 ) from its received codeword (m,X 2 ) and the probability 
measure p(m{). This reliability information is input to Decoder 1, which revises its measure p(m\) based on this 
information and the original received codeword. Decoder 1 sends the new reliability information to Decoder 2, 
which revises its measure using this new information. Turbo decoding proceeds in an iterative manner, with the 
two component decoders alternately updating their probability measures. Ideally the decoders eventually agree on 
probability measures that reduce to hard decisions rn = rri\ = m 2 . However, the stopping condition for turbo 
decoding is not well-defined, in paid because there arc many cases in which the turbo decoding algorithm does not 
converge; i.e., the decoders cannot agree on the value of m. Several methods have been proposed for detecting 
convergence (if it occurs), including bit estimate variance [Berr96] and neural net-based techniques [14], 




Figure 8.15: Turbo Decoder. 

The simulated performance of turbo codes over multiple iterations of the decoder is shown in Figure 8.16 for 
a code composed of two rate 1/2 convolutional codes with constraint length K = 5 separated by an interleaver 
of depth d = 2 16 = 65,536. The decoder converges after approximately 18 iterations. This curve indicates 
several important aspects of turbo codes. First, note their exception performance: bit error probability of 10 f ’ at 
an Eb/No of less than 1 dB. In fact, the original turbo code proposed in [9] performed within .5 dB of the Shannon 
capacity limit at F), = 10~ 5 . The intuitive explanation for the amazing performance of turbo codes is that the 
code complexity introduced by the encoding structure is similar to the codes that achieve Shannon capacity. The 
iterative procedure of the turbo decoder allows these codes to be decoded without excessive complexity. However, 
note that the turbo code exhibits an error floor: in Figure 8.16 this floor occurs at 10 ~ 6 . This floor is problematic 
for systems that require extremely low bit error rates. Several mechanisms have been investigated to lower the 
error floor, including bit interleaving and increasing the constraint length of the component codes. 

An alternative to parallel concatenated coding is serial concatenated coding [15]. In this coding technique, 
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E b /N 0 in dB 

Figure 8.16: Turbo Code Performance (Rate 1/2, K = 5 component codes with interleaver depth 2 16 ). 



one component code serves as an outer code, and the output of this first encoder is interleaved and passed to a 
second encoder. The output of the second encoder comprises the coded bits. Iterative decoding between the inner 
and outer codes is used for decoding. There has been much work comparing serial and parallel concatenated code 
performance, e.g. [15, 17, 16]. While both codes perform very well under similar delay and complexity conditions, 
serial concatenated coding in some cases performs better at low bit error rates and also can exhibit a lower error 
floor. 



8.6 Low Density Parity Check Codes 

Low density parity check (LDPC) codes were originally invented by Gallager in his 1961 Masters thesis [10]. 
ffowever, these codes were largely ignored until the introduction of turbo codes, which rekindled some of the same 
ideas. Subsequent to the landmark paper on turbo codes in 1993 [9], LDPC codes were reinvented by Mackay 
and Neal [18] and by Wiberg [19] in 1996. Shortly thereafter it was recognized that these new code designs were 
actually reinventions of Gallager’s original work, and subsequently much work has been devoted to finding the 
capacity limits, encoder and decoder designs, and practical implementation of LDPC codes for different channels. 

LDPC codes are linear block codes with a particular structure for the parity check matrix H, which was 
defined in Section 8.2.3. Specifically, a (d v . d c ) regular binary LDPC has a parity check matrix H with d v ones in 
each column and d c ones in each row, where d v and d c are chosen as part of the codeword design and are small 
relative to the codeword length. Since the fraction of nonzero entries in H is small, the parity check matrix for the 
code has a low density, and hence the name low density parity check codes. 

Provided that the codeword length is long, LDPC codes achieve performance close to the Shannon limit, in 
some cases surpassing the performance of parallel or serially concatenated codes [24]. The fundamental practical 
difference between turbo codes and LDPC codes is that turbo codes tend to have low encoding complexity (linear in 
blocklength) but high decoding complexity (due to their iterative nature and message passing). In contrast, LDPC 
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codes tend to have relatively high encoding complexity but low decoding complexity. In particular, like turbo 
codes, LDPC decoding uses iterative techniques, which arc related to Pearl's belief propagation commonly used 
by the artificial intelligence community [25]. However, the belief propagation corresponding to LDPC decoding 
may be simpler than for turbo decoding [25, 26]. In addition, the belief propagation decoding is parallelizable 
and can be closely approximated with very low complexity decoders [20], although this may also be possible for 
turbo decoding. Finally, the decoding algorithm for LDPC codes can detect when a correct codeword has been 
detected, which is not necessarily the case for turbo codes. There remains many open questions regarding the 
relative performance of turbo and LDPC codes. 

Additional work in the area of LDPC codes includes finding capacity limits for these codes [20], determining 
effective code designs [29] and efficient encoding and decoding algorithms [20, 28], and expanding the code 
designs to include nonregular [24] and nonbinary LDPC codes [21] as well as coded modulation [22], 

8.7 Coded Modulation 

Although Shannon proved the capacity theorem for AWGN channels in the late 1940s, it wasn’t until the 1990s that 
rates approaching the Shannon limit were attained, primarily for AWGN channels with binary modulation using 
turbo codes. Shannon’s theorem predicted the possibility of reducing both energy and bandwidth simultaneously 
through coding. However, as described in Section 8.1, traditional error-correction coding schemes, such as block 
convolutional, and turbo codes, provide coding gain at the expense of increased bandwidth or reduced data rate. 

The spectrally-efficient coding breakthrough came when Ungerboeck [30] introduced a coded-modulation 
technique to jointly optimize both channel coding and modulation. This joint optimization results in significant 
coding gains without bandwidth expansion. Ungerboeck’s trellis-coded modulation, which uses multilevel/phase 
signal modulation and simple convolutional coding with mapping by set partitioning, has remained superior over 
more recent developments in coded modulation (coset and lattice codes), as well as more complex trellis codes 
[31]. We now outline the general principles of this coding technique. Comprehensive treatments of trellis, lattice, 
and coset codes can be found in [30, 32, 31], 

The basic scheme for trellis and lattice coding, or more generally, any type of coset coding, is depicted in 
Figure 8.17. There are five elements required to generate the coded-modulation: 

1 . A binary encoder E , block or convolutional, that operates on k uncoded data bits to produce k + r coded 
bits. 

2. A subset selector, which uses the coded bits to choose one of 2 k+r subsets from a partition of the TV- 
dimensional signal constellation. 

3. A point selector, which uses n — k additional uncoded bits to choose one of the 2" k signal points in the 
selected subset. 

4. A constellation map, which maps the selected point from TV-dimensional space to a sequence of TV/2 points 
in two-dimensional space. 

5. An MQAM modulator (or other M - ary modulator). 

The first two steps described above are the channel coding, and the remaining steps arc the modulation. The 
receiver essentially reverses the modulation and coding steps: after MQAM demodulation and an inverse 2/TV 
constellation mapping, decoding is done in essentially two stages: first, the points within each subset that arc 
closest to the received signal point are determined; then, the maximum-likelihood subset sequence is calculated. 
When the encoder FT is a convolutional encoder, this coded-modulation scheme is referred to as a trellis code; for 
E a block encoder, it is called a lattice (or block) code. 
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k bits k+r bits 




Signal Points Signal Points 



Figure 8.17: General Coding Scheme. 



The steps described above essentially decouple the channel coding gain from gain associated with signal- 
shaping in the modulation. Specifically, the code distance properties, and thus the channel coding gain, arc de- 
termined by the encoder (E) properties and the subset partitioning, which arc essentially decoupled from signal 
shaping. We will discuss the channel coding gain in more detail below. Optimal shaping of the signal constellation 
provides up to an additional 1.53 dB of shaping gain (for asymptotically large N), independent of the channel cod- 
ing scheme 1 . However, the performance improvement from shaping gain is offset by the corresponding complexity 
of the constellation map, which grows exponentially with N. The size of the transmit constellation is determined 
by the average power constraint, and doesn’t affect the shape or coding gain. 

The channel coding gain results from a selection of all possible sequences of signal points. If we consider a 
sequence of N input bits as a point in ./V-dimensional space (the sequence space), then this selection is used to 
guarantee some minimum distance d min in the sequence space between possible input sequences. Errors generally 
occur when a sequence is mistaken for its closest neighbor, and in AWGN channels this error probability is a 
decreasing function of d‘f run . We can thus decrease the BER by increasing the separation between each point in 
the sequence space by a fixed amount (“stretching” the space). However, this will result in a proportional power 
increase, so no net coding gain is realized. The effective power gain of the channel code is, therefore, the minimum 
squared distance between allowable sequence points (the sequence points obtained through coding), multiplied by 
the density of the allowable sequence points. Specifically, if the minimum distance and density of points in the 
sequence space are denoted by do and Ao, respectively, and if the minimum distance and density of points in 
the sequence space obtained through coding arc denoted by d m i n and A, respectively, then maximum-likelihood 
sequence detection yields a channel coding gain of 




The second bracketed term in this expression is also refered to as the constellation expansion factor, and equals 
2~ r (per N dimensions) for a redundancy of r bits in the encoder E [31]. 

Some of the nominal coding gain in (8.77) is lost due to correct sequences having more than one nearest 
neighbor in the sequence space, which increases the possibility of incorrect sequence detection. This loss in 
coding gain is characterized by the error coefficient, which is tabulated for most common lattice and trellis codes 

'A square constellation has 0 dB of shaping gain; a circular constellation, which is the geometrical figure with the least average energy 
for a given area, achieves the maximum shape gain for a given N [31]. 
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in [31]. In general, the error coefficient is larger for lattice codes than for trellis codes with comparable values of 

G c . 

Channel coding is done using set partitioning of lattices. A lattice is a discrete set of vectors in real Euclidean 
A r -spacc that forms a group under ordinary vector addition, so the sum or difference of any two vectors in the 
lattice is also in the lattice. A sub-lattice is a subset of a lattice that is itself a lattice. The sequence space for 
uncoded M-QAM modulation is just the iV-cube 2 , so the minimum distance between points is no different than 
in the two-dimensional case. By restricting input sequences to lie on a lattice in iV-space that is denser than the 
A' -cube, we can increase d m i n while maintaining the same density (or equivalently, the same average power) in the 
transmit signal constellation; hence, there is no constellation expansion. The A r -cubc is a lattice, however for every 
N > 1 there are denser lattices in A-dimensional space. Finding the densest lattice in N dimensions is a well- 
known mathematical problem, and has been solved for all N for which the decoder complexity is manageable 3 . 
Once the densest lattice is known, we can form pardoning subsets, or cosets, of the lattice through translation of 
any sublattice. The choice of the partitioning sublattice will determine the size of the partition, i.e. the number of 
subsets that the subset selector in Figure 8.17 has to choose from. Data bits arc then conveyed in two ways: through 
the sequence of cosets from which constellation points arc selected, and through the points selected within each 
coset. The density of the lattice determines the distance between points within a coset, while the distance between 
subset sequences is essentially determined by the binary code properties of the encoder E , and its redundancy r. 
If we let d p denote the minimum distance between points within a coset, and d s denote the minimum distance 
between the coset sequences, then the minimum distance code is d m i n = min(d p , d s ). The effective coding gain 
is given by 

G c = 2 ~ 2r / N d 2 min , (8.78) 

where 2 2r / A is the constellation expansion factor (in two dimensions) from the r extra bits introduced by the 
binary channel encoder. 

Returning to Figure 8.17, suppose that we want to send m = n + r bits per dimension, so an N sequence 
conveys mN bits. If we use the densest lattice in N space that lies within an N sphere, where the radius of 
the sphere is just large enough to enclose 2 mN points, then we achieve a total coding gain which combines the 
channel gain (resulting from the lattice density and the encoder properties) with the shaping gain of the N sphere 
over the N rectangle. Clearly, the channel coding gain is decoupled from the shape gain. An increase in signal 
power would allow us to use a larger N sphere, and hence transmit more uncoded bits. It is possible to generate 
maximum-density A' -dimensional lattices for A r = 4, 8, 16, and 24 using a simple partition of the two-dimensional 
rectangular lattice combined with either conventional block or convolutional coding. Details of this type of code 
construction, and the corresponding decoding algorithms, can be found in [32] for both lattice and trellis codes. 
For these constructions, an effective coding gain of approximately 1.5, 3.0, 4.5, and 6.0 dB is obtained with 
lattice codes, for N = 4, 8, 16, and 24, respectively. Trellis codes exhibit higher coding gains with comparable 
complexity. 

We conclude this section with an example of coded-modulation: the N = 8, 3 dB gain lattice code proposed 
in [32], First, the two-dimensional signal constellation is partitioned into four subsets as shown in Figure 8.18, 
where the subsets are represented by the points Aq, A\, Bq, and B\, respectively. From this subset partition, we 
form an 8-dimensional lattice by taking all sequences of four points in which all points are either A points or B 
points and moreover, within a four point sequence, the point subscripts satisfy the parity check i \ + ?2 + di + H = 0 
(so the sequence subscripts must be codewords in the (4,3) parity-check code, which has a minimum Hamming 
distance of two). Thus, three data bits and one parity check bit arc used to determine the lattice subset. The square 
of the minimum distance resulting from this subset partition is four times that of the uncoded signal constellation, 
yielding a 6 dB gain. However, the extra parity check bit expands the constellation by 1/2 bit per dimension, which 

2 The Cartesian product of two-dimensional rectangular lattices with points at odd integers. 

3 The complexity of the maximum-likelihood decoder implemented with the Viterbi algorithm is roughly proportional to N. 
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from Chapter 5.3.3 costs and additional factor of 4" 5 = 2 or 3 dB. Thus, the net coding gain is 6 — 3 = 3 dB. The 
remaining data bits arc used to choose a point within the selected subset, so for a data rate of rn bits/symbol, the 
four lattice subsets must each have 2 m_1 points 4 . 




Figure 8.18: Subset Partition for an Eight-Dimensional Lattice. 

Coded modulation using turbo codes has also been investigated [33, 34, 35], This work shows that turbo 
trellis coded modulation can come very close to the Shannon limit for nonbinary signalling. 

8.8 Coding and Interleaving for Fading Channels 

Block, convolutional, and coded modulation arc designed for good performance in AWGN channels. In fading 
channels errors associated with the demodulator tend to occur in bursts, corresponding to the times when the 
channel is in a deep fade. Most codes designed for AWGN channels cannot correct for the long bursts of errors 
exhibited in fading channels. Hence, codes design for AWGN channels can exhibit worse performance in fading 
than an uncoded system. 

To improve performance of coding in fading channels, coding is typically combined with interleaving to 
mitigate the effect of error bursts. The basic premise of coding and interleaving is to spread error bursts due 
to deep fades over many codewords such that each received codeword only exhibits at most a few simultaneous 
symbol errors, which can be corrected for. The spreading out of burst errors is accomplished by an interleaver and 
the error correction is accomplished by the code. The size of the interleaver must be large enough so that fading is 
independent across a received codeword. Slowly fading channels require large interleavers, which in turn can lead 
to large delays. 

Coding and interleaving is a form of diversity, and performance of coding and interleaving is often charac- 
terized by the diversity order associated with the resulting probability of error. This diversity order is typically 
a function of the minimum Hamming distance of the code. Thus, designs for coding and interleaving on fading 
channels must focus on maximizing the diversity order of the code, rather than on metrics like Euclidean distance 
which are used as a performance criterion in AWGN channels. In the following sections we discuss coding and 

4 This yields m — 1 bits/symbol, with the additional bit/symbol conveyed by the channel code. 
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interleaving for block, convolutional, and coded modulation in more detail. We will assume that the receiver has 
knowledge of the channel fading, which greatly simplifies both the analysis and the decoder. Estimates of channel 
fading arc commonly obtained through pilot symbol transmissions [36, 37]. ML detection of coded signals in 
fading without this channel knowledge is computationally intractable [38], so usually requires approximations to 
either the ML decoding metric or the channel [38, 39, 40, 41, 42, 43]. Note that turbo codes designed for AWGN 
channels, described in Section 8.5, have an interleaver inherent to the code design. However, the interleaver design 
considerations for AWGN channels arc different than for fading channels. A discussion of interleaver design and 
performance analysis for turbo codes in fading channels can be in [44, Chapter 8], [45, 46]. 



8.8.1 Block Coding with Interleaving 

Block codes are typically combined with block interleaving to spread out burst errors from fading. A block in- 
terleaver is an array with d rows and n columns, as shown in Ligure 8.19. Lor block interleavers designed for an 
(n, k) block code, codewords are read into the interleaver by rows so that each row contains an (n, k) codeword. 
The interleaver contents are read out by columns into the modulator for subsequent transmission over the channel. 
During transmission codeword symbols in the same codeword are separated by d — 1 other symbols, so symbols 
in the same codeword experience approximately independent fading if their separation in time is greater than the 
channel coherence time: i.e. if dT s > T c ~ 1 /B c i, where T s is the codeword symbol duration, T c is the channel co- 
herence time, and li,/ is the channel Doppler. An interleaver is called a deep interleaver if the condition dT s > T c 
is satisfied. The deinterleaver is an array identical to the interleaver. Bits are read into the deinterleaver from the 
demodulator by column so that each row of the deinterleaver contains a codeword (whose bits have been corrupted 
by the channel.) The deinterleaver output is read into the decoder by rows, i.e. one codeword at a time. 

Ligure 8.19 illustrates the ability of coding and interleaving to correct for bursts of errors. Suppose our coding 
scheme is an (n, k) binary block code with error correction capability t = 2. If this codeword is transmitted through 
a channel with an error burst of three symbols, then three out of four of the codeword symbols will be received 
in error. Since the code can only correct 2 or fewer errors, the codeword will be decoded in error. However, if 
the codeword is put through an interleaver then, as shown in Ligure 8.19, the error burst of three symbols will be 
spread out over three separate codewords. Since a single symbol error can be easily corrected by an (n, k) code 
with t = 2, the original information bits can be decoded without error. Convolutional interleavers are similar in 
concept to block interleavers, and are better suited to convolutional codes, as will be discussed in Section 8.8.2. 

Performance analysis of coding and interleaving requires pairwide-error probability analysis or application of 
the Chernoff or union bounds. Details of this analysis can be found in [2, Chapter 14.6]. The union bound provides 
a simple approximation to performance. Assume a Rayleigh fading channel with deep interleaving such that each 
coded symbol fades independently. Then the union bound for an (n, k) block code with soft-decision decoding 
under noncoherent LSK modulation yields a codeword error given as 

P e < (2 k - l)[4p(l - p)] dmin , (8.79) 



where d m i n is the minimum Hamming distance of the code and 

1 

P = 



2 + 7? c 7b 



(8.80) 



Similarly, for slowly fading channels where a coherent phase reference can be obtained, the union bound on 
codeword error probability an (n, k) block code with soft-decision decoding and BPSK modulation yields 
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Figure 8.19: The Interleaver/De-interleaver operation. 



Note that both (8.79) and (8.81) are similar to the formula for error probability under MRC diversity combining 
given by (7.23), with d miv providing the diversity order. Si mi lar formulas apply for hard decoding, with diversity 
order reduced by a factor of two relative to soft-decision decoding. Thus, designs for block coding and interleaving 
over fading channels optimize their performance by maximizing the Hamming distance of the code. 

Coding and interleaving is a suboptimal coding technique, since the correlation of the fading which affects 
subsequent bits contains information about the channel which could be used in a true maximum-likelihood de- 
coding scheme. By essentially throwing away this information, the inherent capacity of the channel is decreased 
[5]. Despite this capacity loss, interleaving codes designed for AWGN is a common coding technique for fad- 
ing channels, since the complexity required to do maximum-likelihood decoding on correlated coded symbols is 
prohibitive. 



Example 8.8: Consider a Rayleigh fading channel with a Doppler of B f j = 80 Hz. The system uses a (5,2) block 
code and interleaving to compensate for the fading. If the codeword symbols arc sent through the channel at 30 
Kbps, what is the required interleaver depth to obtain independent fading on each symbol. What is the longest 
burst of codeword symbol errors that can be corrected and the total interleaver delay for this depth? 
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Solution: The (5,2) code has a mi nimum distance of 3 so it can correct t = .5(3 — 1) = 1 codeword symbol 
error. The codeword symbols arc sent through the channel at a rate R s = 30 Kbps, so the symbol time is T s = 
1 / R s = 3.3 • 10 -5 . Assume a coherence time for the channel of T c = 1 / B,j = .0125 s. The bits in the interleaver 
are separated by dT s , so we require dT s > T c for independent fading on each codeword symbol. Solving for 
d yields d > T c /T s = 375. Since the interleaver spreads a burst of errors over the depth d of the interleaver, 
a burst of d symbol errors in the interleaved codewords will result in just one symbol error per codeword after 
deinterleaving, which can be corrected. So the system can tolerate an error burst of 375 symbols. However, all 
rows of the interleaver must be filled before it can read out by columns, hence the total delay of the interleaver is 
ndT s = 5 • 375 • 3.3 • 10 -5 = 62.5 msec. This delay exceeds the delay that can be tolerated in a voice system. We 
thus see that the price paid for correcting long error bursts through coding and interleaving is significant delay. 



8.8.2 Convolutional Coding with Interleaving 

As with block codes, convolutional codes suffer performance degradation in fading channels, since the code is 
not designed to correct for bursts of errors. Thus, it is common to use an interleaver to spread out error bursts. 
In block coding the interleaver spreads errors across different codewords. Since there is no similar notion of a 
codeword in convolutional codes, a slightly different interleaver design is needed to mitigate the effect of burst 
errors. The interleaver commonly used with convolutional codes, called a convolutional interleaver, is designed 
to both spread out burst errors and to work well with the incremental nature of convolutional code generation [7,8]. 

An example block diagram for a convolutional interleaver is shown in Figure 8.20. The encoder output is 
multiplexed into buffers of increasing size, from no buffering to a buffer of size N — 1. The channel input is 
similarly multiplexed from these buffers into the channel. The reverse operation is performed at the decoder. Thus, 
the convolutional interleaver delays the transmission through the channel of the encoder output by progressively 
larger amounts, and this delay schedule is reversed at the receiver. This interleaver takes sequential outputs of the 
encoder and separates them by N — 1 other symbols in the channel transmission, thereby breaking up burst errors 
in the channel. Note that a convolutional encoder can also be used with a block code, but it is most commonly used 
with a convolutional code. The total memory associated with the convolutional interleaver is ,5N(N — 1) and the 
delay is N(N — 1 )T S [1], where T s is the symbol time for transmitting the coded symbols over the channel. 



Encoder 





Figure 8.20: Convolutional Coding and Interleaving 

The probability of error analysis for convolutional coding and interleaving is given in [2, Section 14.6] under 
similar assumptions as the block fading analysis. The Chernoff bound again yields probability of error under soft- 
decision decoding with a diversity order based on the minimum free distance of the code. Hard decision decoding 
reduces this diversity by a factor of two. 



Example 8.9: Consider a channel with coherence time T c = 12.5 msec and a coded bit rate of R s = 100, 000 
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Kilosymbols per second. Find the average delay of a convolutional interleaver that achieves independent fading 
between subsequent coded bits. 

Solution: For the convolutional interleaver, each subsequent coded bit is separated by NT S , and we require NT S > 
T c for independent fading, where T s = 1 /R s . Thus we have N > T c /T s = .0125/. 00001 = 1250. Note that 
this is the same as the required depth for a block interleaver to get independent fading on each coded bit. The total 
delay is TV (TV — 1)T S = 15 s. This is a very high delay for either voice or data. 



8.8.3 Coded Modulation with Symbol/Bit Interleaving 

As with block and convolutional codes, coded modulation designed for an AWGN channel performs poorly in 
fading. This leads to the notion of coded modulation with interleaving for fading channels. However, unlike block 
and convolutional codes, there are two options for interleaving in coded modulation. One option is to interleave 
the bits and then map them to modulated symbols. This is called bit-interleaved coded modulation (BICM). 
Alternatively, the modulation and coding can be done jointly as in coded modulation for AW GN channels and the 
resulting symbols interleaved prior to transmission. This technique is called symbol-interleaved coded modulation 
(SICM). 

SICM seems at first like the natural approach, since it preserves joint coding and modulation, the main design 
premise behind coded modulation. However, the coded modulation design criterion must be changed in fading, 
since performance in fading depends on the code diversity as characterized by its Hamming distance rather than 
its Euclidean distance. Initial work on coded modulation for fading channels focused on techniques to maximize 
diversity in SICM. However, good design criteria were hard to obtain, and the performance of these codes was 
somewhat disappointing [47, 48, 49]. 

A major breakthrough in the design of coded modulation for fading channels was the discovery of bit inter- 
leaved coded modulation (BICM) [51, 50]. In BICM the code diversity equals to the smallest number of distinct 
bits (rather than channel symbols) along any error event. This is achieved by bit-wise interleaving at the encoder 
output prior to symbol mapping, with an appropriate soft-decision bit metric as an input to the Viterbi decoder. 
While this breaks the coded modulation paradigm of joint modulation and coding, it provides much better perfor- 
mance than SICM. Moreover, [50] provided analytical tools for evaluating the performance of BICM as well as 
design guidelines for good performance. BICM is now the dominant technique for improving the performance of 
coded modulation in fading channels. 

8.9 Unequal Error Protection Codes 

When not all bits transmitted over the channel have the same priority or bit error probability requirement, mul- 
tiresolution or unequal error protection (UEP) codes can be used. This scenario arises, for example, in voice and 
data systems where voice is typically more tolerant to bit errors than data: data packets received in error must 
be retransmitted, so Pi, < 10~ 6 is typically required, whereas good quality voice requires only on the order of 
A < 10 3 . This scenario also arises for certain types of compression. For example, in image compression, bits 
corresponding to the low resolution reproduction of the image are required, whereas high resolution bits simply 
refine the image. With multiresolution channel coding, all bits arc received correctly with a high probability under 
benign channel conditions. However, if the channel is in a deep fade, only the high priority or bits requiring low 
Pi, will be received correctly with high probability. 
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Practical implementation of a multilevel code was first studied by Imai and Hirakawa [52], Binary UEP 
codes were later considered both for combined speech and channel coding [53], and combined image and channel 
coding [54]. These implementations use traditional (block or convolutional) error-correction codes, so coding 
gain is directly proportional to bandwidth expansion. Subsequently, two bandwidth-efficient implementations for 
UEP have been proposed: time-multiplexing of bandwidth-efficient coded modulation [55], and coded-modulation 
techniques applied to both uniform and nonuniform signal constellations [56, 57]. All of these multilevel codes can 
be designed for either AWGN or fading channels. We now briefly summarize these UEP techniques; specifically, 
we describe the principles behind multilevel coding and multistate decoding, and the more complex bandwidth- 
efficient implementations. 

A block diagram of a general multilevel encoder is shown in Figure 8.21. The source encoder first divides 
the information sequence into M parallel bit streams of decreasing priority. The channel encoder consists of M 
different binary error-correcting codes C\, ... . Cm with decreasing codeword distances. The ith priority bit stream 
enters the ith encoder, which generates the coded bits s r . If the 2 M points in the signal constellation arc numbered 
from 0 to 2 m — 1, then the point selector chooses the constellation point s corresponding to 

M 

s = ^« 1 x2’" 1 . (8.81) 

i = 1 

For example, if M = 3 and the signal constellation is 8PSK, then the chosen signal point will have phase 27rs/8. 




Figure 8.21: Multilevel Encoder 

Optimal decoding of the multilevel code uses a maximum-likelihood decoder, which determines the input se- 
quence that maximizes the received sequence probability. The maximum- likelihood decoder must therefore jointly 
decode the code sequences si, . . . , s m • This can entail significant complexity even if the individual codes in the 
multilevel code have low complexity. For example, if the component codes are convolutional codes with 2 Mi states, 
i = 1, , M, the number of states in the optimal decoder is 2 / ' 1 +— +mm . Due to the high complexity of optimal 
decoding, the suboptimal technique of multistage decoding, introduced in [52], is used for most implementations. 
Multistage decoding is accomplished by decoding the component codes sequentially. First, the most robust code, 
C\, is decoded, then C 2 , and so forth. Once the code sequence corresponding to encoder C\ is estimated, it is 
assumed correct for code decisions on the less robust code sequences. 

The binary encoders of this multilevel code require extra code bits to achieve their coding gain, thus they are 
not bandwidth-efficient. An alternative approach recently proposed in [56] uses time-multiplexing of the trellis 
codes described in Chapter 8. In this approach, different conventional coded modulation schemes, such as lattice 
or trellis codes, with different coding gains are used for each priority class of input data. The transmit signal 
constellations corresponding to each encoder may differ in size (number of signal points), but the average power 
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of each constellation is the same. The signal points output by each of the individual encoders arc then time- 
multiplexed together for transmission over the channel, as shown in Figure 8.22 for two different priority bit 
streams. Let R, denote the bit rate of encoder C, in this figure, for i = 1,2. If T\ equals the fraction of time that 
the high-priority C \ code is transmitted, and T -2 equals the fraction of time that the C>> code is transmitted, then the 
total bit rate is (R\T\ + Il 2 T‘i)/{T\ + T 2 ), with the high-priority bits comprising R\T\/{R\T\ + II 2 T 2 ) percent 
of this total. 




Channel 



High-Priority 

Data 



Low-Priority 

Data 




Figure 8.22: Transceiver for Time-Multiplexed Coded Modulation 

The time-multiplexed coding method yields a higher gain if the constellation maps Si and S 2 of Figure 8.22 
are designed jointly. This revised scheme is shown in Figure 8.23 for 2 encoders, where the extension to M 
encoders is straightforward. Recall that in trellis coding, bits arc encoded to select the lattice subset, and uncoded 
bits choose the constellation point within the subset. The binary encoder properties reduce the P& for the encoded 
bits only; the I), for the uncoded bits is determined by the separation of the constellation signal points. We can 
easily modify this scheme to yield two levels of coding gain, where the high-priority bits are heavily encoded and 
used to choose the subset of the partitioned constellation, while the low-priority bits arc uncoded or lightly coded 
and used to select the constellation signal point. 

8.10 Joint Source and Channel Coding 

The underlying premise of UEP codes is that the bit error probabilities of the channel code should be matched to 
the priority or 1), requirements associated with the bits to be transmitted. These bits are often taken from the output 
of a compression algorithm acting on the original data source. Hence, UEP coding can be considered as a joint 
design between compression (also called source coding) and channel coding. Although Shannon determined that 
the source and channel codes can be designed separately on an AWGN channel with no loss in optimality [59], 
this result holds only in the limit of infinite source code dimension, infinite channel code block length, and infinite 
complexity and delay. Thus, there has been much work on investigating the benefits of joint source and channel 
coding under more realistic system assumptions. 
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Channel 




Figure 8.23: Joint Optimization of Signal Constellation 



Previous work in the area of joint source and channel coding falls into several broad categories: source- 
optimized channel coding, channel-optimized source coding, and iterative algorithms, which combine these two 
code designs. In source -optimized channel coding, the source code is designed for a noiseless channel. A channel 
code is then designed for this source code to minimize end-to-end distortion over the given channel based on the 
distortion associated with corruption of the different transmitted bits. UEP channel coding where the Pi, of the 
different component channel codes is matched to the bit priorities associated with the source code is an example of 
this technique. Source-optimized channel coding has been applied to image compression with convolution channel 
coding and with rate-compatible punctured convolutional (RCPC) channel codes in [54, 60, 61]. A comprehensive 
treatment of matching RCPC channel codes or multilevel quadrature amplitude modulation (MQAM) to subband 
and lineal - predictive speech coding in both AWGN and Rayleigh fading channels, can be found in [62]. In source- 
optimized modulation, the source code is designed for a noiseless channel and then the modulation is optimized 
to minimize end-to-end distortion. An example of this approach is given in [63], where compression by a vector 
quantizer (VQ) is followed by multicarrier modulation, and the modulation provides unequal error protection to 
the different source bits by assigning different powers to each subcarrier. 

Channel-optimized source coding is another approach to joint source and channel coding. In this technique 
the source code is optimized based on the error probability associated with the channel code, where the channel 
code is designed independent of the source. Examples of work taking this approach include the channel-optimized 
vector quantizer (COVQ) and its scalar variation [64, 65]. Source-optimized channel coding and modulation can 
be combined with channel-optimized source coding using an iterative design. This approach is used for the joint 
design of a COVQ and multicarrier modulation in [66] and for the joint design of a COVQ and RCPC channel code 
in [67]. Combined trellis coded modulation and trellis-coded quantization, a source coding strategy that borrows 
from the basic premise of trellis-coded modulation, is investigated in [68, 69]. All of this work on joint source and 
channel code design indicates that significant performance advantages are possible when the source and channel 
codes are jointly designed. Moreover, many sophisticated channel code designs, such as turbo and LDPC codes, 
have not yet been combined with source codes in a joint optimization. Thus, much more work is needed in the 
broad area of joint source and channel coding to optimize performance for different applications. 
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Chapter 8 Problems 

1. Consider a (3,1) linear block code where each codeword consists of 3 data bits and one parity bit. 

(a) Find all codewords in this code. 

(b) Find the minimum distance of the code. 

2. Consider a (7,4) code with generator matrix 

'0101100' 

10 10 10 0 
0 110 0 10 ‘ 

_ 1 1 0 0 0 0 1 _ 

(a) Find all the codewords of the code. 

(b) What is the minimum distance of the code? 

(c) Find the parity check matrix of the code. 

(d) Find the syndrome for the received vector R = [1101011]. 

(e) Assuming an information bit sequence of all 0s, find all minimum weight error patterns e that result in 
a valid codeword that is not the all zero codeword. 

(f) Use row and column operations to reduce G to systematic form and find its corresponding parity check 
matrix. Sketch a shift register implementation of this systematic code. 

3. All Hamming codes have a minimum distance of 3. What is the error-correction and error-detection capa- 
bility of a Hamming code? 

4. The (15,11) Hamming code has generator polynomial g(X) = 1 + X + X 4 . Determine if the codewords 
described by polynomials c\ (X) = 1 + X + X 3 + X 7 andc 2 (X) = 1 + X 3 + X 5 + X { 6 are valid codewords 
for this generator polynomial. Also find the systematic form of this polynomial p(X) + X n ~ k u(X) that 
generates the codewords in systematic form. 

5. The (7,4) cyclic Hamming code has a generator polynomial g(X) = 1 + X 2 + X 3 . 

(a) Find the generator matrix for this code in systematic form. 

(b) Find the parity check matrix for the code. 

(c) Suppose the codeword C = [1011010] is transmitted through a channel and the corresponding received 
codeword is C = [1010011]. Find the syndrome polynomial associated with this received codeword. 

(d) Find all possible received codewords such that for transmitted codeword C = [1011010], the received 
codeword has a syndrome polynomial of zero. 

6. The weight distribution of a Hamming code of block length n is given by 

n 1 r i 

N(x) = Y N iX { = (l + x) n + n(l + .x)- 5(n - 1) (l-x)- 5(n+1) , 

^ n+1 L J 

i=0 

where iV,; denotes the number of codewords of weight i. 

(a) Use this formula to determine the weight distribution of a Hamming (7,4) code. 
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(b) Use the weight distribution from paid (a) to find the union upper bound based on weight distribution 
(8.38) for a Hamming (7,4) code, assuming BPSK modulation of the coded bits with 7 = 10 dB. 
Compare with the probability of error from the looser bound (8.39) for the same modulation. 

7. Find the union upper bound on probability of codeword error for a Hamming code with m = 7. Assume the 
coded bits are transmitted over an AWGN channel using 8 PSK modulation with an SNR of 10 dB. Compute 
the probability of bit error for the code assuming a codeword error corresponds to one bit error, and compare 
with the bit error probability for uncoded modulation. 

8 . Plot Ft) versus 7 & for a (5,2) linear block code with d m i n = 3 and 0 < E^/Nq < 20 dB using the union 
bound for probability of codeword error. Assume the coded bits arc transmitted over the channel using 
QPSK modulation. Over what range of E^/Nq does the code exhibit negative coding gain? 

9. Find the approximate coding gain (8.47) of a (7,4) Hamming code with SDD over uncoded modulation 
assuming 7 & = 15 dB. 

10. Plot the probability of codeword error for a (24,12) code with d m i n = 8 for 0 < 77 , < 10 dB under both hard 
and soft decoding, using the union bound for hard decoding and the approximation (8.47) for soft decoding. 
What is the difference in coding gain at high SNR for the two decoding techniques? 

11. Evalute the upper and lower bounds on codeword error probability, (8.35) and (8.36) respectively, for an 
extended Golay code with HDD, assuming an AWGN channel with BPSK modulation and an SNR of 10 
dB. 

12. Consider a Reed Solomon code with k = 3 and K = 4, mapping to 8 PSK modulation. Find the number of 
errors that can be corrected with this code and its minimum distance. Also find its probability of bit error 
assuming the coded symbols transmitted over the channel via 8 PSK have Pm = 10 3 . 

13. In a Rayleigh fading channel, determine an upper bound for the bit error probability k), of a Golay (23,12) 
code with deep interleaving ( dT s » T c ), BPSK modulation, soft-decision decoding, and an average coded 
E c /Nq of 15 dB. Compare with the uncoded I), in Rayleigh fading. 

14. Consider a Rayleigh fading channel with BPSK modulation, average SNR of dB, and a doppler of 80 Hz. The 
data rate over the channel is 30 Kbps. Assume that bit errors occur on this channel whenever P\,{ 7 ) > 10 2 . 
Design an interleaver and associated (n, k) block code which corrects essentially all of the bit errors, where 
the interleaver delay is constrained to be less than 5 msec. Your design should include the dimensions of the 
interleaver, as well as the block code type and the values of n and k. 

15. For the trellis of Figure 8.7, determine the state sequence and encoder output assuming an initial state S = 00 
and information bit sequence U = [0110101101]. 

16. Consider the convolutional code generated by the encoder shown in Figure 8.24. 

(a) Sketch the trellis diagram of the code. 

(b) Find the path metric for the all-zero path, assuming probability of symbol error p = 10 3 . 

(c) Find one path at a minimum Hamming distance from the all-zero path and compute its path metric for 
the same p as in paid (b). 

17. This problem is based on the convolutional encoder of Figure 8.24. 
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Stage 1 Stage 2 Stage 3 



Figure 8.24: Convolutional Encoder for Problems 16 and 17 



(a) Draw the state diagram for this convolutional encoder. 

(b) Determine its transfer function T(D, N. J). 

(c) Determine the minimum distance of paths through the trellis to the all-zero path. 

(d) Compute the upper bound (8.75) on probability of bit error for this code assuming SDD and BPSK 
modulation with 7 ;, = 10 dB. 

(e) Compute the upper bound (8.76) on probability of bit error for this code assuming HDD and BPSK 
modulation with jj, = 10 dB. How much coding gain is achieved with soft versus hard decoding? 

18. Consider a channel with coherence time T c = 10 msec and a coded bit rate of R s = 50, 000 Kilosymbols 
per second. Find the average delay of a convolutional interleaver that achieves independent fading between 
subsequent coded bits. Also find the memory requirements of the system. 

19. Suppose you have a 16QAM signal constellation which is trellis encoded using the following scheme: 

The set partitioning for 16 QAM is shown in Figure 8.18. 

(a) Assuming that parallel transitions dominate the error probability, what is the coding gain of this trellis 
code relative to uncoded 8 PSK, given that do for the 16QAM is .632 and do for the 8 PSK is .765? 

(b) Draw the trellis for this scheme, and assign subsets to the transitions according to the heuristic rules of 
Ungerboeck. 

(c) What is the minimum distance error event through the trellis relative to the path generated by the all 
zero bit stream? 

(d) Assuming that your answer to paid (c) is the minimum distance error event for the trellis, what is d rnin 
of the code? 

(e) Draw the trellis structure and assign transitions assuming that the convolutional encoder is rate 2/3 (so 
uncoded bits 62 and 63 are input, and 3 coded bits are output). 
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Figure 8.25: 16QAM Trellis Encoder. 



20. Assume a multilevel encoder as in Figure 8.21 where the information bits have three different error protection 
levels (M = 3) and the three encoder outputs are modulated using 8PSK modulation. Assume the code C, 
associated with the ith bit stream b, is a Hamming code with parameter m,, where m\ = 2, m 2 = 3, and 

m 3 = 4. 

(a) Find the probability of error for each Hamming code C, assuming it is decoded individually using 
HDD. 

(b) If the symbol time of the 8PSK modulation is T s = 10 //sec, what is the data rate for each of the 3 bit 
streams? 

(c) For what size code must the maximum-likelihood decoder of this UEP code be designed? 

21. Design a two-level UEP code using either Hamming or Golay codes such that for a channel with an SNR of 
10 dB, the UEP code has Pi, = 10“ 3 for the low-priority bits and Pi, = 10~ 6 for the high priority bits. 
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Chapter 9 

Adaptive Modulation and Coding 



Adaptive modulation and coding enables robust and spectrally-efficient transmission over time-varying channels. 
The basic premise is to estimate the channel at the receiver and feed this estimate back to the transmitter, so that 
the transmission scheme can be adapted relative to the channel characteristics. Modulation and coding techniques 
that do not adapt to fading conditions require a fixed link margin to maintain acceptable performance when the 
channel quality is poor. Thus, these systems are effectively designed for the worst-case channel conditions. Since 
Rayleigh fading can cause a signal power loss of up to 30 dB, designing for the worst case channel conditions can 
result in very inefficient utilization of the channel. Adapting to the channel fading can increase average throughput, 
reduce required transmit power, or reduce average probability of bit error by taking advantage of favorable channel 
conditions to send at higher data rates or lower power, and reducing the data rate or increasing power as the channel 
degrades. In Chapter 4.2.4, the optimal adaptive transmission scheme that achieves the Shannon capacity of a flat- 
fading channel was derived. In this chapter we describe more practical adaptive modulation and coding techniques 
to maximize average spectral efficiency while maintaining a given average or instantaneous bit error probability. 
The same basic premise can be applied to MIMO channels, frequency-selective fading channels with equalization, 
OFDM, or CDMA, and cellular systems. The application of adaptive techniques to these systems will be described 
in subsequent chapters. 

Adaptive transmission was first investigated in the late sixties and early seventies [1, 2], Interest in these 
techniques was short-lived, perhaps due to hardware constraints, lack of good channel estimation techniques, 
and/or systems focusing on point-to-point radio links without transmitter feedback. As technology evolved these 
issues became less constraining, resulting in a revived interest in adaptive modulation methods for 3rd generation 
wireless systems [3, 4, 5, 6, 7, 8, 9, 10, 11, 12]. As a result, many wireless systems, including both GSM and 
CDMA cellular systems as well as wireless LANs, are using or planning to use adaptive transmission techniques 
[13, 14, 15, 16]. 

There arc several practical constraints that determine when adaptive modulation should be used. Adaptive 
modulation requires a feedback path between the transmitter and receiver, which may not be feasible for some 
systems. Moreover, if the channel is changing faster than it can be reliably estimated and fed back to the transmitter, 
adaptive techniques will perform poorly. Many wireless channels exhibit variations on different timescales, for 
example multipath fading, which can change very quickly, and shadowing, which changes more slowly. Often 
only the slow variations can be tracked and adapted to, in which case flat fading mitigation is needed to address the 
effects of multipath. Hardware constraints may dictate how often the transmitter can change its rate and/or power, 
and this may limit the performance gains possible with adaptive modulation. Finally, adaptive modulation typically 
varies the rate of data transmission relative to channel conditions. We will see that average spectral efficiency of 
adaptive modulation under an average power constraint is maximized by setting the data rate to be small or zero 
in poor channel conditions. However, with this scheme the quality of fixed-rate applications with hard delay 
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constraints such as voice or video may be significantly compromised. Thus, in delay-constrained applications the 
adaptive modulation should be optimized to minimize outage probability for a fixed data rate [17]. 

9.1 Adaptive Transmission System 

In this section we describe the system associated with adaptive transmission. The model is the same as the model 
of Chapter 4.2.1 used to determine the capacity of flat-fading channels. We assume linear modulation where the 
adaptation that takes place at a multiple of the symbol rate R s = 1 /T s . We also assume the modulation uses 
ideal Nyquist data pulses (sine [i/T s ]), so the signal bandwidth B = 1 /T s . We model the flat-fading channel as a 
discrete-time channel where each channel use corresponds to one symbol time T s . The channel has stationary and 
ergodic time-varying gain \J g[i] that follows a given distribution p(g) and AWGN n[i], with power spectral density 
Nq/2. Let S denote the average transmit signal power, B = 1 jT s denote the received signal bandwidth, and g 
denote the average channel gain. The instantaneous received SNR is then y[z] = Sg[i\/(NoB), 0 < y[i] < oo, and 
its expected value over all time is 7 = Sg / (N^B). Since g[i] is stationary, the distribution of y[i] is independent 
of i, and we denote this distribution by p(y). 

In adaptive transmission we estimate the power gain or received SNR at time i and adapt the modulation 
and coding parameters accordingly. The most common parameters to adapt arc the data rate R[i], transmit power 
£[*], and coding parameters C[i\. For M- ary modulation the data rate R.[i] = log 2 M[i\/T s = B log 2 M[i\ bps. 
The spectral efficiency of the M- ary modulation is R[i]/B = log 2 M[i\ bps/Hz. We denote the SNR estimate 
as y[i] = Sg[i]/(NoB), which is based on the power gain estimate g[i\. Suppose the transmit power is adapted 
relative to y[i]. We denote this adaptive transmit power at time i by .S'fyfi] ) = S[i.] and the received power at 
time i is then y[i] Similarly, we can adapt the data rate of the modulation R( A f [i]) = R[i ] and/or the coding 

parameters = C[i ] relative to the estimate y [z] . When the context is clear - , we will omit the time reference 

i relative to 7 , S( 7 ), R( 7 ), and C( 7 ). 

The system model is illustrated in Figure 9.1. We assume that an estimate g[i] of the channel power gain g[i] 
at time i is available to the receiver after an estimation time delay of i e and that this same estimate is available 
to the transmitter after a combined estimation and feedback path delay of i,j = i e + ij. The availability of this 
channel information at the transmitter allows it to adapt its transmission scheme relative to the channel variation. 
The adaptive strategy may take into account the estimation error and delay in g[i] or it may treat g[i\ as the true 
gain: this issue will be discussed in more detail in Section 9.3.7. We assume that the feedback path does not 
introduce any errors, which is a reasonable assumption if strong error correction and detection codes are used on 
the feedback path and packets associated with detected errors are retransmitted. 



TRANSMITTER CHANNEL RECEIVER 




FEEDBACK CHANNEL 



Figure 9.1: System Model. 
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The rate of channel variation will dictate how often the transmitter must adapt its transmission parameters, 
and will also impact the estimation error of g[i\. When the channel gain consists of both fast and slow fading 
components, the adaptive transmission may adapt to both if g[i] changes sufficiently slowly, or it may adapt to 
just the slow fading. In particular, if g[i\ corresponds to shadowing and multipath fading, then at low speeds the 
shadowing is essentially constant, and the multipath fading is sufficiently slow so that it can be estimated and fed 
back to the transmitter with an estimation error and delay that does not significantly degrade performance. At high 
speeds the system can no longer effectively estimate and feed back the multipath fading in order to adapt to it. 
In this case, the adaptive transmission responds to the shadowing variations only, and the error probability of the 
modulation must be averaged over the fast fading distribution. Adaptive techniques for combined fast and slow 
fading are discussed in Section 9.5 

9.2 Adaptive Techniques 

There arc many parameters that can be varied at the transmitter relative to the channel gain 7 . In this section we 
discuss adaptive techniques associated with variation of the most common parameters: data rate, power, coding, 
error probability, and combinations of these adaptive techniques. 

9.2.1 Variable-Rate Techniques 

In variable-rate modulation the data rate R[ 7 ] is varied relative to the channel gain 7 . This can be done by fixing 
the symbol rate R s = 1/T S of the modulation and using multiple modulation schemes or constellation sizes, 
or by fixing the modulation (e.g. BPSK) and changing the symbol rate. Symbol rate variation is difficult to 
implement in practice since a varying signal bandwidth is impractical and complicates bandwidth sharing. In 
contrast, changing the constellation size or modulation type with a fixed symbol rate is fairly easy, and these 
techniques arc used in current systems. Specifically, EGPRS for data transmission in GSM cellular systems varies 
between 8 PSK and GMSK modulation, and GPRS for data transmission in IS- 136 TDMA cellular systems can use 
4, 8 , and 16 level PSK modulation, although the 16 level modulation has yet to be standardized [15]. In general the 
modulation parameters to vary the transmission rate arc fixed over a block or frame of symbols, where the frame 
size is a parameter of the design. Frames may also include pilot symbols for channel estimation and other control 
information. 

When a discrete set of modulation types or constellation sizes arc used, each value of 7 must be mapped to 
one of the possible modulation schemes. This is often done to maintain the bit error probability of each scheme 
below a given value. These ideas arc illustrated in the following example as well as in subsequent sections on 
specific adaptive modulation techniques. 



Example 9.1: Consider an adaptive modulation system that uses QPSK and 8 PSK for a target P b of approximately 
HP 3 . If the target I), cannot be met with either scheme, then no data is transmitted. Find the range of 7 values 
associated with the three possible transmission schemes (no transmission, QPSK, and 8 PSK) as well as the average 
spectral efficiency of the system, assuming Rayleigh fading with 7 = 20 dB. 

Solution: First note that the SNR 7 = 7 s for both QPSK and 8 PSK. From Chapter 6.1 we have P b ~ Q(y/ 7 ) for 
QPSK and P b ss . 666 Q(v / 27 sin( 7 r/ 8 )) for 8 PSK. Since 7 > 14.79 dB yields P b < 10 ~ 3 for 8 PSK, the adaptive 
modulation uses 8 PSK modulation for 7 > 14.79 dB. Since 7 > 10.35 dB yields P b < 10 ~ 3 for QPSK, the 
adaptive modulation uses QPSK modulation for 7 > 10.35 dB. The channel is not used for 7 < 10.35 dB. 

We determine the average rate by analyzing how often each of the different transmission schemes is used. 
Since 8 PSK is used when 7 > 14.78 dB = 30.1, in Rayleigh fading with 7 = 20 dB the spectral efficiency 
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R[l\/B = log 2 8 = 3 bps/Hz is transmitted a fraction of time equal to P% = J 3 “ 1 10(1 c/7 = -74. QPSK 

is used when 10.35 < 7 < 14.78 dB, where 10.35 dB=10.85 in linear units. So P[7] = log 2 4 = 2 bps/Hz is 
transmitted a fraction of time equal to P4 = f^Qg 5 j^o e ~ 7 ^ 100 dj = -157. During the remaining .103 fraction of 
time there is no data transmission. So the average spectral efficiency is .74 x 3 + .157 x 2 + .103 x 0 = 2.534 
bps/Hz. 

Note that when 7 < 10.35 dB, rather than suspending transmission, which leads to an outage probability of 
roughly .1, either just one signaling dimension could be used (i.e. BPSK could be transmitted) or error correction 
coding could be added to the QPSK to meet the P/, target. If block or convolutional codes were used then the 
spectral efficiency for 7 < 10.35 dB would be less than 2 bps/Hz, but larger than a spectral efficiency of zero 
corresponding to no transmission. These variable-coding techniques arc described in Section 9.2.4. 



9.2.2 Variable-Power Techniques 

Adapting the transmit power alone is generally used to compensate for SNR variation due to fading. The goal is to 
maintain a fixed bit error probability or, equivalently, a constant received SNR. The power adaptation thus inverts 
the channel fading so that the channel appeal's as an AWGN channel to the modulator and demodulator 1 . The 
power adaptation for channel inversion is given by 



spy) 

S 7 ’ 



(9.1) 



where cr equals the constant received SNR. The average power constraint S implies that 

J = J = 1. (9.2) 

Solving (9.2) for a yields that cr = 1/ E[l/^\, so cr is determined by p( 7) which in turn depends on the average 
transmit power S through 7. Thus, for a given average power S, if the value for a required to meet the target BER 
is greater than 1/ E\l/^\ then this target cannot be met. Note that for Rayleigh fading where 7 is exponentially 
distributed, E[1 / 7] = 00, so no target P& can be met using channel inversion. 

The fading can also be inverted above a given cutoff 70, which leads to a truncated channel inversion for 
power adaptation. In this case the power adaptation is given by 



Sin) = / 7 7 > 7 o 
S 1 0 7 < 7o 



(9.3) 



The cutoff value 70 can be based on a desired outage probability p out = p( 7 < 70) or based on a desired target 
BER above a cutoff that is determined by the target BER and 77(7). Since the channel is only used when 7 > 70, 
given an average power S we have a = 1 /E-y 0 [I/7] , where 

E-ydIVt] = [ -P(l)d 7- (9-4) 

■> 70 7 



Example 9.2: Find the power adaptation for BPSK modulation that maintains a fixed Pi, = 10 3 in nonoutage for 
'Channel inversion and truncated channel inversion were discussed in Chapter 4.2.4 in the context of fading channel capacity. 
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a Rayleigh fading channel with 7 = 10 dB. Also find the outage probability that results. 

Solution The power adaptation is truncated channel inversion, so we need only find a and 70 . For BPSK modula- 
tion, with a constant SNR of o = 4.77 we get Pb = Q(y/2o) = 10 -3 . Setting <7 = 1 /E 70 [ 1 / 7 ] and solving for 70, 
which must be done numerically, yields 70 = .7423. So P out = p( 7 < 70) = 1 — e _7 °/ 10 = .379. So there is a 
high outage probability, which results from requiring Pi, = 10” 3 in this relatively weak channel. 



9.2.3 Variable Error Probability 

We can also adapt the instantaneous BER subject to an average BER constraint Pb- In Chapter 6.3.2 we saw that 
in fading channels the instantaneous error probability varies as the received SNR 7 varies, resulting in an average 
BER of Pb = f (7)^(7) (Py. This is not considered an adaptive technique since the transmitter does not adapt 

to 7. Thus, in adaptive modulation error probability is typically adapted along with some other form of adaptation 
such as constellation size or modulation type. Adaptation based on varying both data rate and error probability to 
reduce transmit energy was first proposed by Hayes in [1], where a 4 dB power savings was obtained at a target 
average bit error probability of HP 4 . 

9.2.4 Variable-Coding Techniques 

In adaptive coding different channel codes are used to provide different amounts of coding gain to the transmitted 
bits. For example, a stronger error correction code may be used when 7 is small, with a weaker code or no coding 
used when 7 is large. Adaptive coding can be implemented by multiplexing together codes with different error 
correction capabilities. However, this approach requires that the channel remain roughly constant over the block 
length or constraint length of the code [7]. On such slowly- varying channels adaptive coding is particularly useful 
when the modulation must remain fixed, as may be the case due to complexity or peak-to-average power ratio 
constraints. 

An alternative technique to code multiplexing is rate-compatible punctured convolutional (RCPC) codes [33]. 
RCPC codes consist of a family of convolutional codes at different code rates R c = k/n. The basic premise of 
RCPC codes is to have a single encoder and decoder whose error correction capability can be modified by not 
transmitting certain coded bits (puncturing the code). Moreover, RCPC codes have a rate-compatibility constraint 
so that the coded bits associated with a high-rate (weaker) code arc also used by all lower-rate (stronger) codes. 
Thus, to increase the error correction capability of the code, the coded bits of the weakest code arc transmitted 
along with additional coded bits to achieve the desired level of error correction. The rate compatibility makes it 
very easy to adapt the error protection of the code, since the same encoder and decoder are used for all codes in the 
RCPC family, with puncturing at the transmitter to achieve the desired error correction. Decoding is performed by 
a Viterbi algorithm operating on the trellis associated with the lowest rate code, with the puncturing incorporated 
into the branch metrics. Puncturing is a very effective and powerful adaptive coding technique, and forms the basis 
of adaptive coding in GSM’s EDGE protocol for data transmission [13]. 

Adaptive coding through either multiplexing or puncturing can be done for fixed modulation or combined with 
adaptive modulation as a hybrid technique. When the modulation is fixed, typically due to transmitter constraints 
on complexity or peak-to-average power ratio, adaptive coding is often the only practical mechanism to address 
the channel variations [6, 7]. The focus of this chapter is on systems where adaptive modulation is possible, so 
adaptive coding on its own will not be further discussed. 
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9.2.5 Hybrid Techniques 



Hybrid techniques can adapt multiple parameters of the transmission scheme, including rate, power, coding, and 
instantaneous error probability. In this case joint optimization of the different techniques is used to meet a given 
performance requirement. Rate adaptation is often combined with power adaptation to maximize spectral effi- 
ciency, and we apply this joint optimization to different modulations in subsequent sections. Adaptive modulation 
and coding has been widely investigated in the literature and is currently used in the EGPRS standard for data 
transmission in GSM cellular systems. Specifically, EGPRS uses nine different modulation and coding schemes: 
four different code rates for GMSK modulation and five different code rates for 8 PSK modulation [13, 15] 



9.3 Variable-Rate Variable-Power MQAM 



In the previous section we discussed general approaches to adaptive modulation and coding. In this section we 
describe a specific form of adaptive modulation where the rate and power of MQAM is varied to maximize spectral 
efficiency while meeting a given instantaneous F), target. We study this specific form of adaptive modulation since 
it provides insight into the benefits of adaptive modulation and, moreover, the same scheme for power and rate 
adaptation that achieves capacity also optimizes this adaptive MQAM design. We will also show that there is a 
constant power gap between the spectral efficiency of this adaptive MQAM technique and capacity in flat-fading, 
and this gap can be partially closed by superimposing a trellis or lattice code on top of the adaptive modulation. 

Consider a family of MQAM signal constellations with a fixed symbol time T s , where M denotes the number 
of points in each signal constellation. We assume T s = 1 / B based on ideal Nyquist pulse shaping. Let S, No, 
7 = and 7 = ^777 be as given in our system model. Then the average E s /Nq equals the average SNR: 



E s = ST s 
N 0 N 0 



(9.5) 



The spectral efficiency for fixed M is R/B = log 2 M, the number of bits per symbol. This efficiency is typically 
parameterized by the average transmit power S and the BER of the modulation technique. 



9.3.1 Error Probability Bounds 

In [20] the BER for an AWGN channel with MQAM modulation, ideal coherent phase detection, and SNR 7 is 
bounded by 

P b < 2e" L57/(M “ 1) . (9.6) 

A tighter bound good to within 1 dB for M > 4 and 0 < 7 < 30 dB is 

Pb < . 2 e _1 - 57/(M_1) . (9.7) 

Note that these expressions are only bounds, so they don’t match the error probability expressions from Table 6 . 1 
of Chapter 6 . We use these bounds since they are easy to invert, so we can obtain M as a function of the target 
Pb and the power adaptation policy, as we will see shortly. Adaptive modulation designs can also be based on 
BER expressions that arc not invertible or BER simulation results, with numerical inversion used to obtain the 
constellation size and SNR associated with a given BER target. 

In a fading channel with nonadaptive transmission (constant transmit power and rate), the average BER is 
obtained by integrating the BER in AWGN over the fading distribution p(j). Thus, we use the average BER 
expression to find the maximum data rate that can achieve a given average BER for a given average SNR. Similarly, 
if the data rate and average BER arc fixed, we can find the required average SNR to achieve this target, as illustrated 
in the next example. 
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Example 9.3: Find the average SNR required to achieve an average BER of Pi, = 10 3 for nonadaptive BPSK 
modulation Rayleigh fading. What is the spectral efficiency of this scheme? 

Solution: From Chapter 6.3.2, BPSK in Rayleigh fading has P b k, ~^=. Thus, without transmitter adaptation, for 
a target average BER of P b = 10~ 3 we require 7 = -=- = 250 = 24 dB. The spectral efficiency is R/B = 
log 2 2 = 1 bps/Hz. We will see that adaptive modulation provides a much higher spectral efficiency at this same 
SNR and targer BER. 



9.3.2 Adaptive Rate and Power Schemes 

We now consider adapting the transmit power S( 7) relative to 7, subject to the average power constraint S and an 
instantaneous BER constraint P b ( 7) = P b . The received SNR is then 75(7) / S, and the Pi, bound for each value 
of 7, using the tight bound (9.7), becomes 



Pb (7) < -2 exp 



-1.57 5(7) 
M - 1 S 



(9.8) 



We adjust M and S( 7) to maintain the target P b . Rearranging (9.8) yields the following maximum constellation 
size for a given I),: 






where 



K = T Pi-< 1. 



ln(5ft) 



We maximize spectral efficiency by maximizing 



E;[log 2 M( 7 )] = / log 2 ( 1 + K y 7 ' 1 ) p(7)d7, 



subject to the power constraint 



5 X'y)p('y)d'y = s. 



(9.9) 

(9.10) 

(9.11) 

(9.12) 



The power adaptation policy that maximizes (9.1 1) has the same form as the optimal power adaptation policy 
(4.12) that achieves capacity: 



5 (?) = f i - w 7 - 7 °/ k 

S 1 0 7 < 70 /K 



(9.13) 



where 70 / K is the optimized cutoff fade depth below which the channel is not used, for K given by (9.10). If we 
define 7 k = 70 / K and multiply both sides of (9.13) by K, we get 



7) f 7^-7 1 > IK 

S 1 0 7 < Ik 



(9.14) 



where 7 k is a cutoff fade depth below which the channel is not used. This cutoff is obtained by the power constraint 

f 1 1 

/ (Pf = K. (9.15) 

J IK 7 
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Substituting (9.13) or (9.14) into (9.9) and (9.1 1) we get that the instantaneous rate is given by 



log 2 M( 7 ) = log 2 (7/7*r) 

and the corresponding average spectral efficiency is given by 



R 

B 





P(7)^7- 



(9.16) 



(9.17) 



Comparing the power adaptation and average spectral efficiency (4.12) (4.13) associated with the Shannon 
capacity of a fading channel with (9.13) and (9.17), the optimal power adaptation and average spectral efficiency 
of adaptive MQAM, we see that the power and rate adaptation are the same and lead to the same average spectral 
efficiency, with an effective power loss of I\ for adaptive MQAM as compared to the capacity-achieving scheme. 
Moreover, this power loss is independent of the fading distribution. Thus, if the capacity of a fading channel 
is R bps/Hz at SNR 7 , uncoded adaptive MQAM requires a received SNR of 7 / K to achieve the same rate. 
Equivalently, K is the maximum possible coding gain for this variable rate and power MQAM method. We 
discuss superimposing a trellis or lattice code on top of adaptive MQAM in Section 9.3.8. 

We plot the average spectral efficiency (9.17) of adaptive MQAM at target P b ’ s of 10 -3 and 10 “ 6 for both 
log-normal shadowing and Rayleigh fading in Figures 9.2 and 9.3, respectively. We also plot the capacity in these 
figures for comparison. Note that the gap between the spectral efficiency of variable-rate variable-power MQAM 
and capacity is the constant K, which from (9.10) is a simple function of the BER. 




Figure 9.2: Average Spectral Efficiency in Log-Normal Shadowing (a = 8 dB). 



9.3.3 Channel Inversion with Fixed Rate 

We can also apply channel inversion power adaptation to maintain a fixed received SNR. We then transmit a single 
fixed-rate MQAM modulation that achieves the target P b . The constellation size M that meets this target P b is 
obtained by substituting by substituting the channel inversion power adaptation 5 ( 7 ) / S = < 7/7 of (9.2) into (9.9) 
with cr = 1/ E[l/^\. Since the resulting spectral efficiency R/B = M, this yields the spectral efficiency of the 
channel inversion power adaptation as 



R 

B 



l0g2 ( 1 + ffi(5A)£[l/ 7 ])- 



(9.18) 
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Figure 9.3: Average Spectral Efficiency in Rayleigh Fading. 



This spectral efficiency is based on the tight bound (9.7); if the resulting M = R/B < 4 the loose bound (9.6) 
must be used in which case ln(5P&) is replaced with ln(.5P&) in (9.18). 

With truncated channel inversion the channel is only used when 7 > 70. Thus, the spectral efficiency with 
truncated channel inversion is obtained by substituting 5(7) / S = <7/7, 7 > 70 into (9.9) and multiplying by the 
probability that 7 > 70. The maximum value is obtained by optimizing relative to the cutoff level 70: 

I = n r l0& (' + > 7o) ' (9 - 19) 

The spectral efficiency of adaptive MQAM with the optimal water-filling and truncated channel inversion power 
adaptation in a Rayleigh fading channel with a target BER of 10~ 3 is shown in Figure 9.4, along with the capacity 
under the same two power adaptation policies. We see, surprisingly, that truncated channel inversion with fixed 
rate transmission has almost the same spectral efficiency as the optimal variable rate and power MQAM. This 
would tend to indicate that truncated channel inversion is more desirable in practice, as it achieves almost the same 
spectral efficiency as variable rate and power transmission but does not require varying the rate. However, this 
assumes there is no restriction on constellation size. Specifically, the spectral efficiencies (9.17), (9.18), and (9.19) 
assume that M can be any real number and that the power and rate can vary continuously with 7. While MQAM 
modulation for noninteger values of M is possible, the complexity is quite high [27]. Moreover, it is difficult in 
practice to continually adapt the transmit power and constellation size to the channel fading, particularly in fast 
fading environments. Thus, we now consider restricting the constellation size to just a handful of values. This will 
clearly impact the spectral efficiency though, as we will show in the next section, not by very much. 

9.3.4 Discrete Rate Adaptation 

We now assume the same model as in the previous section but we restrict the adaptive MQAM to a limited set 
of constellations. Specifically, we assume a set of square constellations of size M 0 = 0, M\ = 2, and Mj = 
2 2 0" -1 ) ) j = 2, ..., N — 1 for some N. We assume square constellations for M > 2 since they arc easier to 
implement than rectangular constellations [21]. We first analyze the impact of this restriction on the spectral 
efficiency of the optimal adaptation policy. We then determine the effect on the channel inversion policies. 

Consider a variable-rate variable-power MQAM transmission scheme subject to the constellation restrictions 
described above. Thus, at each symbol time we transmit a symbol from a constellation in the set {Mj : j = 
0,1,..., N — 1}: the choice of constellation depends on the fade level 7 over that symbol time. Choosing the 
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Figure 9.4: Spectral Efficiency with Different Power Adaptation Policies (Rayleigh Fading). 

Mq constellation corresponds to no data transmission. For each value of 7 , we must decide which constellation 
to transmit and what the associated transmit power should be. The rate at which the transmitter must change its 
constellation and power is analyzed below. Since the power adaptation is continuous while the constellation size 
is discrete, we call this a continuous-power discrete -rate adaptation scheme. 

We determine the constellation size associated with each 7 by discretizing the range of channel fade levels. 
Specifically, we divide the range of 7 into N fading regions Rj = 7?), j = 0, . . . , N — 1 , where 7_i = 0 

and 7 at_i = 00 . We transmit constellation Mj when 7 € Rj. The spectral efficiency for 7 e Rj is thus log 2 Mj 
bps/Hz for j > 0. 

The adaptive MQAM design requires that the boundaries of the Rj regions be determined. While these 
boundaries can be optimized to maximize spectral efficiency, as derived in Section 9.4.2, the optimal boundaries 
cannot be found in closed form and require an exhaustive search to obtain. Thus, we will use a suboptimal 
technique to determine boundaries. These suboptimal boundaries arc much easier to find than the optimal ones and 
have almost the same performance. Define 

M( 7) = (9-20) 

7r' 

where 7 ^- > 0 is a parameter that will later be optimized to maximize spectral efficiency. Note that substituting 
(9.13) into (9.9) yields (9.20) with 7 ^- = 7 k- Therefore the appropriate choice of 7 ^- in (9.20) defines the optimal 
constellation size for each 7 when there is no constellation restriction. 

Assume now that 7 ^- is fixed and define M/v = 00 . To obtain the constellation size Mj , j = 0, .... — 1 

for a given SNR 7 , we first compute M( 7 ) from (9.20). We then find j such that Mj < M( 7 ) < Mj + 1 and assign 
constellation Mj to this 7 value. Thus, for a fixed 7 , we transmit the largest constellation in our set {Mj : j = 
0, . . . , N} that is smaller than M( 7 ). For example, if the fade level 7 satisfies 2 < 7 / 7 ^- < 4 we transmit BPSK. 
The region boundaries other than 7 _i = 0 and 'Jn-i = 00 arc located at 7 j = Mj + i,j = 0, . . . , N — 2. 
Clearly, increasing the number of discrete signal constellations N yields a better approximation to the continuous 
adaptation (9.9), resulting in a higher spectral efficiency. 

Once the regions and associated constellations are fixed we must find a power adaptation policy that satisfies 
the BER requirement and the power constraint. By (9.9) we can maintain a fixed BER for the constellation Mj > 0 
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using the power adaptation policy 



Sjh) _ f (Mj - M i < 4 < M i + 1 

.S' ( 0 Mj = 0 



for 7 € Rj, since this power adaptation policy leads to a fixed received E s /Nq for the constellation Mj of 

EM lEM Ml Z 1 ,,,,, 

No S K ■ 1 ■ ' 

By definition of K, MQAM modulation with constellation size Mj and E s /Nq given by (9.21) results in the 
desired target P b . In Table 9.1 we tabulate the constellation size and power adaptation as a function of 7 and 7^ 
for 5 fading regions. 



Region(j) 


7 Range 


Mj 


Sj{l)/S 


0 


0 < lh* K < 2 


0 


0 


1 


2 < 7 /7k < 4 


2 


1 

Kl 


2 


4 < 7/7^ < 16 


4 


3 

Kl 


3 


16 < 7/7^ < 64 


16 


15 

Ki 


4 


64 < 7/7^ < 00 


64 


63 

K'y 



Table 9.1: Rate and Power Adaptation for 5 Regions. 



The spectral efficiency for this discrete -rate policy is just the sum of the data rates associated with each of the 
regions multiplied by the probability that 7 falls in that region: 

N - 1 

g = E ^g 2 (Mj)p(Mj < 7/7^ < M j+1 ). (9.23) 

3 = 1 

Since Mj is a function of 7^, we can maximize (9.23) relative to 'lb subject to the power constraint 



N - 1 



V 

j=l J E k M, 



l* K M j+1 Sj ^ 



s 



ph)d~f = 1, 



(9.24) 



where Sj^/S is defined in (9.21). There is no closed-form solution for the optimal 7^-: in the calculations below 
it was found using numerical search techniques. 

In Figures 9.5 and 9.6 we show the maximum of (9.23) versus the number of fading regions N for log-normal 
shadowing and Rayleigh fading, respectively. We assume a BER of Iff :i for both plots. From Figure 9.5 we see 
that restricting our adaptive policy to just 6 fading regions (Mj = 0,2,4, 16, 64, 256) results in a spectral efficiency 
that is within 1 dB of the efficiency obtained with continuous-rate adaptation (9.17) under log-normal shadowing. 
A similar result holds for Rayleigh fading using 5 regions (Mj =0,2,4,16,64). 

We can simplify our discrete -rate policy even further by using a constant transmit power for each constellation 
Mj. Thus, each fading region is associated with one signal constellation and one transmit power. We call this 
policy discrete -power discrete -rate adaptive MQAM. Since the transmit power and constellation size are fixed in 
each region, the BER will vary with 7 in each region. Thus, the region boundaries and transmit power must be set 
to achieve a given target average BER. 

A restriction on allowable signal constellations will also affect the total channel inversion and truncated chan- 
nel inversion policies. Specifically, suppose we assume that with the channel inversion policies, the constellation 



273 






Figure 9.5: Discrete-Rate Efficiency in Log-Normal Shadowing (a = 8 dB.) 




Figure 9.6: Discrete-Rate Efficiency in Rayleigh Fading. 



must be chosen from a fixed set of possible constellations M = {Mo = 0,..., A//y_ ] }. For total channel inversion 
the spectral efficiency with this restriction is thus 



R 

B 



log 2 




- 1 *— ) 



M 



(9.25) 



where \_x\m denotes the largest number in the set M less than or equal to x. The spectral efficiency with this policy 
will be restricted to values of log 2 M, M G A4, with discrete jumps at the 7 values where the spectral efficiency 
without constellation restriction (9.18) equals log 2 M. For truncated channel inversion the spectral efficiency is 
given by 



R 

B 



max log 2 
70 



1 + 



- 1.5 



P{ 7 > To)- 

- M 



(9.26) 



In Figures 9.7 and 9.8 we show the impact of constellation restriction on adaptive MQAM for the different 
power adaptation policies. When the constellation is restricted we assume 6 fading regions so M. = {M 0 = 
0,2,4..., 256} The power associated with each fading region for the discrete -power discrete -rate policy has an 
average BER equal to the instantaneous BER of the discrete -rate continuous-power adaptative policy. We see from 
these figures that for variable-rate MQAM with a small set of constellations, restricting the power to a single value 
for each constellation degrades spectral efficiency by about 1-2 dB relative to continuous power adaptation. For 
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comparison, we also plot the maximum efficiency (9. 17) for continuous power and rate adaptation. All discrete-rate 
policies have performance that is within 3 dB of this theoretical maximum. 

These figures also show the spectral efficiency of fixed-rate transmission with truncated channel inversion 
(9.26). The efficiency of this scheme is quite close to that of the discrete -power discrete -rate policy. However, to 
achieve this high efficiency, the optimal 70 is quite large, with a corresponding outage probability -Pout = pin < 
70) ranging from .1 to .6. Thus, this policy is si mi lar to packet radio, with bursts of high speed data when the 
channel conditions arc favorable. The efficiency of total channel inversion (9.25) is also shown for log-normal 
shadowing: this efficiency equals zero in Rayleigh fading. We also plot the spectral efficiency of nonadaptive 
transmission, where both the transmission rate and power arc constant. As discussed in Section 9.3.1, the average 
BER in this case is obtained by integrating the probability of error (9.31) against the fade distribution p{ 7). The 
spectral efficiency is obtained by determining the value of M which yields a 10 “ 3 average BER for the given 
value of 7, as was illustrated in Example 9.3. Nonadaptive transmission clearly suffers a large efficiency loss in 
exchange for its simplicity. However, if the channel varies rapidly and cannot be accurately estimated, nonadaptive 
transmission may be the best alternative. Si mi lar curves can be obtained for a target BER of 10 “ 6 , with roughly 
the same spectral efficiency loss relative to a 10 ~ 3 BER as was exhibited in Figures 9.2 and 9.3. 




Figure 9.7: Efficiency in Log-Normal Shadowing (a = 8dB). 




Figure 9.8: Efficiency in Rayleigh Fading. 
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9.3.5 Average Fade Region Duration 



The choice of the number of regions to use in the adaptive policy will depend on how fast the channel is changing 
as well as on the hardware constraints, which dictate how many constellations arc available to the transmitter and at 
what rate the transmitter can change its constellation and power. Channel estimation and feedback considerations 
along with hardware constraints may dictate that the constellation remains constant over tens or even hundreds of 
symbols. In addition, power- amplifier linearity requirements and out-of-band emission constraints may restrict the 
rate at which power can be adapted. An in-depth discussion of hardware implementation issues can be found in 

[22] . However, determining how long the SNR 7 remains within a particular fading region Rj is of interest, since 
it determines the tradeoff between the number of regions and the rate of power and constellation adaptation. We 
now investigate time duration over which the SNR remains within a given fading region. 

Let TJ denote the average time duration that 7 stays within the jth fading region. Let Aj = 7 * K Mj for 7^ 
and Mj as previously defined. The jth fading region is then defined as {7 : Aj < 7 < Aj + 1}. We call TJ the jth 
average fade region duration (AFRD). This definition is similar to the average fade duration (AFD) (Chapter 3.2.3), 
except that the AFD measures the average time that 7 stays below a single level, whereas we arc interested in the 
average time that 7 stays between two levels. For the worst-case region (j = 0) these two definitions coincide. 

Determining the exact value of rj requires a complex derivation based on the joint density 7(7, 7), and remains 
an open problem. However, a good approximation can be obtained using the finite-state Markov model derived in 

[23] . In this model, fading is approximated as a discrete-time Markov process with time discretized to the symbol 
period T s . It is assumed that the fade value 7 remains within one region over a symbol period and from a given 
region the process can only transition to the same region or to adjacent regions. Note that this approximation can 
lead to longer deep fade durations than more accurate models [24]. The transition probabilities between regions 
under this assumption arc given as 

N j 1 ] T s NjT s 

PjJ + 1 = — > PjJ - 1 = ~ ’ PiJ = 1 “ PiJ + 1 ~PjJ- 1 > ( 9 - 27 ) 

*3 n j 

where Nj is the level-crossing rate at Aj and 77 is the steady-state distribution corresponding to the jth region: 
7 Tj = p{Aj < 7 < Aj. |_i). Since the time over which the Markov process stays in a given state is geometrically 
distributed [25, 2.66], TJ is given by 






717 



Ti = 



Pj,j+ 1 + Pj,j- 1 N j+1 + Nj 



(9.28) 



The value of TJ is thus a simple function of the level crossing rate and the fading distribution. While the level 
crossing rate is known for Rayleigh fading [19, Section 1.3.4], it cannot be obtained for log-normal shadowing 
since the joint distribution 7(7, 7) for this fading type is unknown. 

In Rayleigh fading the level crossing rate is given by 



Nj = (9.29) 

where fo = v/X is the Doppler frequency. Substituting (9.29) into (9.28) it is easily seen that TJ is inversely 
proportional to the Doppler frequency. Moreover, since 77 and Aj do not depend on fo, if we compute TJ for a 
given Doppler frequency fo, we can compute 7 corresponding to another Doppler frequency fo as 



Ti = 



/d_ 

— Tj. 

fD 



(9.30) 
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We tabulate below the T ] values corresponding to five regions ( Mj = 0, 2, 4, 16, 64) in Rayleigh fading 2 for 
fn = 100 Hz and two average power levels: 7 = lOdB (7^ = 1.22) and 7 = 20dB (7^- = 1.685). The AFRD 
for other Doppler frequencies is easily obtained using the table values and (9.30). This table indicates that, even at 
high velocities, for symbol rates of 100 Kilosymbols/sec the discrete -rate discrete -power policy will maintain the 
same constellation and transmit power over tens to hundreds of symbols. 



Region(j) 


7 = lOdB 


7 = 20dB 


0 


2.23ms 


.737ms 


1 


.830ms 


.301ms 


2 


3.00ms 


1.06ms 


3 


2.83ms 


2.28ms 


4 


1.43ms 


3.84ms 



Table 9.2: Average Fade Region Duration r. ; for fn = 100 Hz. 



Example 9.4: Find the AFRDs for a Rayleigh fading channel with 7 = 10 dB, Mj = 0, 2, 4, 16, 64, 64, and 
F d = 50 Hz. 

Solution: We first note that all parameters are the same as used in the calculation of Table 9.2 except that the 
Doppler fn = 50 Hz is half the Doppler of fn = 100 Hz used to compute this table. Thus, from (9.30), we obtain 
the AFRDs with this new Doppler by multiplying each value in the table by fnlfo = 2. 



In shadow fading we can obtain a coarse approximation for t] based on the autocorrelation function A(d) = 
a ^dB e ~ S ^ Xc ' Specifically, we can approximate the AFRD for all regions as rf « ,lX c /v since then the correlation 
between fade levels separated in time by t] is .9. Thus, for a small number of regions it is likely that 7 will remain 
within the same region over this time period. 



9.3.6 Exact versus Approximate P b 

The adaptive policies described in prior sections are based on the BER upper bounds of (9.3.1). Since these are 
upper bounds, they will lead to a lower BER than the target. We would like to see how the BER achieved with 
these policies differs from the target BER. A more accurate value for the BER achieved with these policies can be 
obtained by simulation or by using a better approximation for BER than the upper bounds. From (6.24) in Chapter 
6, the BER of MQAM with Gray coding at high SNRs is well-approximated by 



Pb 



2{Vm~ 1 ) / r^r ) 

\[M log 2 M ^ \ \ M-l) ' 



(9.31) 



Moreover, for the continuous-power discrete -rate policy, 7 = E s /Nq for the yth signal constellation is 



EJJ) = Mj - 1 

N 0 K 

2 The validity of the finite-state Markov model for Rayleigh fading channels has been confirmed in [26]. 



(9.32) 
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Thus, we can obtain a more accurate analytical expression for the average BER associated with our adaptive 
policies by averaging over the BER (9.31) for each signal constellation as: 



P = 2(v / Iij 1) 

6 ^v^log 2 M, Q 



( I 3(Mj -1) \ 
VV K(Mj -1 )J 



n*K M j+i 
/ P( 7)^7- 

h*K M i 



(9.33) 



with Mjv = oo. 

We plot the analytical expression (9.33) along with the simulated BER for the variable rate and power MQAM 
with a target BER of 10 -3 in Figures 9.9 and 9.10 for log-normal shadowing and Rayleigh fading, respectively. 
The simulated BER is slightly better than the analytical calculation of (9.33) due to the fact that (9.33) is based 
on the nearest neighbor bound and neglects some small terms. Both the simulated and analytical BER are smaller 
than the target BER of 10~ 3 , for 7 > 10 dB. The BER bound of 10 -3 breaks down at low SNRs, since (9.7) 
is not applicable to BPSK, and we must use the looser bound (9.6). Since the adaptive policy uses the BPSK 
constellation often at low SNRs, the Pi, will be larger than that predicted from the tight bound (9.7). The fact that 
the simulated BER is less than the target at high SNRs implies that the analytical calculations in Figures 9.5 and 
9.6 are pessimistic. A slightly higher efficiency could be achieved while still maintaining the target P5 of 10 -3 . 
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Figure 9.9: BER for Log-Normal Shadowing (6 Regions). 
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Figure 9.10: BER for Rayleigh Fading (5 Regions). 
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9.3.7 Channel Estimation Error and Delay 

In this section we examine the effects of estimation error and delay, where the estimation error e = 7/7 / 1 and 
the delay 7 / = ij + i e /(). We first consider the estimation error. Suppose the transmitter adapts its power and 
rate relative to a target BER P b based on the channel estimate 7 instead of the true value 7 . From (9.8) the BER is 
then bounded by 



Pb( 7 , 7 ) < -2 exp 



- 1.57 S(j) 
M( 7 )-! S 



•2[5 n] 1/€ , 



(9.34) 



where the second equality is obtained by substituting the optimal rate (9.9) and power (9.13) policies. For e = 1 
(9.34) reduces to the target P b . For e / 1, e > 1 yields an increase in BER above the target, and e < 1 yields a 
decrease in BER. 

The effect of estimation error on BER is given by 



Pb< 




.2[5P b ]' y ^p('y, 7 ) (( 7 ^ 7 . 




•2[5 P b \ 1/e p(e)de 



(9.35) 



The distribution p(e) is a function of the joint distribution p{ 7 , 7 ) which in turn depends on the channel estimation 
technique. It has been shown recently that when the channel is estimated using pilot symbols, the joint distribution 
of the signal envelope and its estimate is bi-variate Rayleigh [28]. This joint distribution was then used in [28] 
to obtain the probability of error for nonadaptive modulation with channel estimation errors. This analysis can be 
extended to adaptive modulation using a si mi lar methodology. 

If the estimation error stays within some finite range then we can bound the effect of estimation error using 
(9.34). We plot the BER increase as a function of a constant e in Figure 9.11. This figure shows that for a target 
BER of 10 -3 the estimation error should be less than 1 dB, and for a target BER of 10 -6 it should be less than .5 
dB. These values arc pessimistic, since they assume a constant value of estimation error. Even so, the estimation 
error can be kept within this range using the pilot-symbol assisted estimation technique described in [18] with 
appropriate choice of parameters. When the channel is underestimated (e < 1) the BER decreases but there will 
also be some loss in spectral efficiency, since the mean of the channel estimate 7 will differ from the true mean 7 . 
The effect of this average power estimation error is characterized in [29]. 




Figure 9.1 1: Effect of Estimation Error on BER. 

Suppose now that the channel is estimated perfectly (e = 1) but the delay i ,i of the estimation and feedback 
path is nonzero. Thus, at time i the transmitter will use the delayed version of the channel estimate 7 [*] = 7 [i — id] 
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Figure 9. 12: Effect of Normalized Delay ( idfn ) on BER. 



to adjust its power and rate. It was shown in [30] that, conditioned on the outdated channel estimates, the received 
signal follows a Ricean distribution, and the probability of error can then be computed by averaging over the 
distribution of the estimates. Moreover, [30] develops adaptive coding designs to mitigate the effect of estimation 
delay on the performance of adaptive modulation. Alternatively, channel prediction can be used to mitigate these 
effects [31]. 

The increase in BER from estimation delay can also be examined in the same manner as in (9.34). Given the 
exact channel SNR 7 [i] and its delayed value 7 [i — id], we have 



p b(l[i],l[i ~ id]) < -2 exp 



— 1.57[i] S(^[i-i d ]) 
[M( 7 [i - i d ]) - 1 S 



= .2[5P b X [i]h[i ~ id] ■ 



(9.36) 



Define £[i, i ( j\ = 7 [*] /7 [* — id]- Since 7 [*] is stationary and ergodic, the distribution of £[i, i J conditioned on 
7 [z] depends only on id and the value of 7 = 7 [*]. We denote this distribution by Pi d (£ (7). The average BER is 
obtained by integrating over £ and 7. Specifically, it is shown in [32] that 



Pb[id] 



l 



7 K U0 



•2[5A 0 ]W£l7H 



p{i)dn, 



(9.37) 



where 7 k is the cutoff level of the optimal policy and p( 7 ) is the fading distribution. The distribution Pi d (£| 7 ) will 
depend on the autocorrelation of the fading process. A closed-form expression for pi d (£( 7 ) in Nakagami fading (of 
which Rayleigh fading is a special case) is derived in [32]. Using this distribution in (9.37) we obtain the average 
BER in Rayleigh fading as a function of the delay parameter id- A plot of (9.37) versus the normalized time delay 
idf D is shown in Figure 9.12. From this figure we see that the total estimation and feedback path delay must be 
kept to within .001 ///> to keep the BER near its desired target. 



9.3.8 Adaptive Coded Modulation 

Additional coding gain can be achieved with adaptive modulation by superimposing trellis codes or more general 
coset codes on top of the adaptive modulation. Specifically, by using the subset partitioning inherent to coded 
modulation, trellis or lattice codes designed for AW GN channels can be superimposed directly onto the adaptive 
modulation with the same approximate coding gain. The basic idea of adaptive coded modulation is to exploit the 
separability of code and constellation design inherent to coset codes, as described in Chapter 8.7. 
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Coded modulation is a natural coding scheme to use with variable-rate variable-power MQAM, since the 
channel coding gain is essentially independent of the modulation. We can therefore adjust the power and rate 
(number of levels or signal points) in the transmit constellation relative to the instantaneous SNR without affecting 
the channel coding gain, as we now describe in more detail. 

The coded modulation scheme is shown in Figure 9.13. The coset code design is the same as it would 
be for an AWGN channel, i.e., the lattice structure and conventional encoder follow the trellis or lattice coding 
designs outlined in Section 8.7. Let G c denote the coding gain of the coset code, as given by (8.78). The source 
coding (modulation) works as follows. The signal constellation is a square lattice with an adjustable number of 
constellation points M. The size of the MQAM signal constellation from which the signal point is selected is 
determined by the transmit power, which is adjusted relative to the instantaneous SNR and the desired BER, as in 
the uncoded case above. 



k bits k+r bits 




n (y)-k Bits One of M(y) 

Constellation 
Points 



Figure 9.13: Adaptive Coded Modulation Scheme 



Specifically, if the BER approximation (7.7) is adjusted for the coding gain, then for a particular SNR= 7 , 



p b ~ 2e~ 1 ' 5 ^" fGc ^ M ~ 1 \ 



(9.38) 



where M is the size of the transmit signal constellation. As in the uncoded case, using the tight bound (9.7) we 
can adjust the number of constellation points M and signal power relative to the instantaneous SNR to maintain a 
fixed BER: 



M( 7 ) 



1.5 7 G c S ( 7 ) 
-ln(5P 6 ) S 



(9.39) 



The number of uncoded bits required to select the coset point is n( 7 ) — 2 k/N = log 2 M( 7 ) — 2 (k + r) /N. Since 
this value varies with time, these uncoded bits must be queued until needed, as shown in Figure 9.13. 

The bit rate per transmission is log 2 M ( 7 ), and the data rate is log 2 A/ ( 7 ) — 2 r/N. Therefore, we maximize 
the data rate by maximizing E[ log 2 M] relative to the average power constraint. From this maximization, we 
obtain the optimal power adaptation policy for this modulation scheme: 



5 (t) = f i - TK C 7 > 70 /Kc 
S \ 0 7 < 7 o /iv c ’ 



(9.40) 



where 70 is the cutoff fade depth, and K c = KG C for K given by (9.48). This is the same as the optimal policy 
for the uncoded case (7.11), with K replaced by K c . Thus, the coded modulation increases the effective transmit 
power by G c , relative to the uncoded variable-rate variable-power MQAM performance. The adaptive data rate is 
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obtained by substituting (9.40) into (9.39) to get 




(9.41) 



The resulting spectral efficiency is 

log 2 (—') pin)dl, (9-42) 

\ 7 K e J 

where 7 k c = 70 /K c . If the constellation expansion factor is not included in the coding gain G c , then we must 
subtract 2r/N from (9.42) to get the data rate. More details on this adaptive coded modulation scheme can be found 
in [34], along with plots of the spectral efficiency for adaptive trellis coded modulation of varying complexity. 
These results indicate that adaptive trellis coded modulation can achieve within 5 dB of Shannon capacity at 
reasonable complexity, and that the coding gains of superimposing a given trellis code onto uncoded adaptive 
modulation are roughly equal to the coding of the trellis code in an AWGN channel. 




9.4 General M - ary Modulations 

The variable rate and power techniques described above for MQAM can be applied to other M- ary modulations. 
For any modulation, the basic premise is the same: the transmit power and constellation size are adapted to maintain 
a given fixed instantaneous BER for each symbol while maximizing average data rate. In this section we will 
consider optimal rate and power adaptation for both continuous-rate and discrete -rate adaption for general M- ary 
modulations. 



9.4.1 Continuous Rate Adaptation 

We first consider the case where both rate and power can be adapted continuously. We want to find the optimal 
power S( 7 ) and rate k( 7 ) = log 2 M ( 7 ) adaptation for general M -ary modulation that maximizes the average data 
rate E[k( 7 )] with average power S while meeting a given BER target. This optimization is simplified when the 
exact or approximate probability of bit error for the modulation can be written in the following form: 



A( 7) 



c 1 exp 




2°3Hi) - C4 ’ 



(9.43) 



where ci, c 2 , and C 3 are positive fixed constants, and C 4 is a real constant. For example, in the BER bounds for 
MQAM given by (9.6) and (9.7), c\ = 2 or .2, c 2 = 1.5, C 3 = 1, and C 4 = 1. The probability of bit error for most 
M- ary modulations can be approximated in this form with appropriate curve fitting. 

The advantage of (9.43) is that, when 7 ) is in this form, we can invert it to express the rate k( 7 ) as a 
function of the power adaptation S( 7 ) and the BER target Pb as follows: 



Ki) = lo g2 M h) = 



jMog 2 [c 4 - 



C27 jTr) 1 

l n (Pb/ Cl) s J 

0 



5 ( 7 ) > 0, k( 7 ) > 0 

else 



(9.44) 



To find the power and rate adaptation that maximize spectral efficiency E[k{^)], we create the Lagrangian 



J(S( 7)) 




+ A 




S(i)p('y)d'y - S . 



(9.45) 
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The optimal adaptation policy maximizes this Lagrangian with nonnegative rate and power, so it satisfies 



dS{ 7 ) 



= 0 , 5 ( 7 ) > 0 , k( 7 ) > 0 . 



Solving (9.46) for 5 ( 7 ) with (9.44) for k( 7 ) yields the optimal power adaptation 

5 (7) _ f 5( 7 )>o,fc( 7 )>o 



where 



I< = -- 



c 4 ln(P b /ci) ' 

The power adaptation (9.47) can be written in the more simplified form 

s(i) = f p - ^(t) > o, Mt) > 0 

S \ 0 else 



(9.47) 



(9.48) 



(9.49) 



The constant ft in (9.49) is determined from the average power constraint (9. 12) 

Although the analytical expression for the optimal power adaptation (9.49) looks simple, its behavior is highly 
dependent on the c 4 values in the f), approximation (9.43). For (9.43) given by the MQAM approximations (9.6) 
or (9.7) the power adaptation is the water-tilling formula given by (9.13). However, water-tilling is not optimal in 
all cases, as we now show. 

Based on (6. 18) from Chapter 6 , with Gray coding the BER for MPSK is tightly approximated as 



log 2 M 



Q(V*y sin(7r 



(9.50) 



However, (9.50) is not in the desired form (9.43). In particular', the Q function is not easily inverted to obtain the 
optimal rate and power adaptation for a given target BER. Let us therefore consider the following three Pi, bounds 
for MPSK, which are valid for A" ( 7 ) > 2: 



Bound 1: Pbi'j) ~ 0.05 exp 



- 67 ^ 

2 1 - 9fe G) - 1 



Bound 2: ^( 7 ) « 0.2 exp — 



_7 7 5M 

1 s 

2 1 -9fc('y) + 1 



Bound 3: Pb( 7 ) ~ 0.25 exp 



2l-94fc(7) 



(9.51) 



(9.52) 



(9.53) 



The bounds arc plotted in Figure 9.14 along with the tight approximation (9.50). We see that all bounds well- 
approximate the exact BER (Given by (6.45) in Chapter 6 ), especially at high SNRs. 

In the first bound (9.51), c 4 = .05, c 2 = 6 , C 3 = 1.9, and c 4 = 1. Thus, in (9.49), K = ~ C4 i n (P b / Cl ) 
positive as long as the target I-), is less than .05, which we assume. Therefore // must be positive for the power 
adaptation = p — ^ to be positive about a cutoff SNR 70 . Moreover, for K positive, k{ 7 ) > 0 for any 
S( 7 ) > 0. Thus, with // and k( 7 ) positive (9.49) can be expressed as 



S (j) = {^k~^k 5 (t) > 0 
S 1 0 else 



(9.54) 




cc 

LU 

m 




Tight Approx. (9.46) 
Exact (6.45) 

Bound 1 (9.47) 
Bound 2 (9.48) 
Bound 3 (9.49) 



20 

SNR (dB) 



Figure 9.14: BER Bounds for MPSK. 



where 70 > 0 is a cut-off fade depth below which no signal is transmitted. Like //, this cutoff value is determined 
by the average power constraint (9.12). The power adaptation (9.54) is the same water-filling as in adaptive MQAM 
given by (9.13), which results from the similarity of the MQAM P b bounds (9.7) and (9.6) to the MPSK bound 
(9.51). The corresponding optimal rate adaptation, obtained by substituting (9.54) into (9.44), is 




^ lo S 2 (^) 7 >7o 

0 else 



(9.55) 



which is also in the same form as the adaptive MQAM rate adaptation (9.16). 

Let us now consider the second bound (9.52). Here c\ = .2, C2 = 7, C3 = 1.9, and C4 = —1. Thus, 
K = - ^ | M //w r| , is negative for a target P b < .2 which we assume. From (9.44), with K negative we must have 
P > 0 in (9.49) to make k( 7) > 0. Then the optimal power adaptation such that S( 7) > 0 and k( 7) > 0 becomes 



£(7) = f P - 77 r fc ( 7 ) > 0 

~S \ 0 else 

From (9.44) the optimal rate adaptation then becomes 

*> = { " 1Og 0 (i) 7 et 7 ” ■ 



(9.56) 



(9.57) 



where 70 = —7^7 is a cutoff fade depth below which the channel is not used. Note that for the first bound (9.51) 
the positivity constraint on power (5( 7) > 0) dictates the cutoff fade depth, whereas for this bound the positivity 
constraint on rate (k( 7) > 0) determines the cutoff. We can rewrite (9.56) in terms of 70 as 



S(j) 

s 



1 



+ 



1 



'yo(-K) ^ 7 (-X) 

0 
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7 > 7o 

else 



(9.58) 





This power adaptation is an inverse waterfilling: since K is negative, less power is used as the channel SNR 
increases above the optimized cut-off fade depth 70 . As usual, the value of 70 is obtained based on the average 
power constraint (9.12). 

Finally, for the third bound (9.53), c\ = .25, C 2 = 8 , C 3 = 1.94, and C 4 = 0. Thus, K = ~ C4 in(p fc / ri ) = 00 
for a target P b < .25, which we assume. From (9.49), the optimal power adaptation becomes 

S (j) _ f M Kl) > 0 , * 5 ( 7 ) > 0 9 59 

S \ 0 else 



This is on-off power transmission: either power is zero or a constant nonzero value. From (9.44) the optimal rate 
adaptation k( 7 ) with this power adaptation is, 



Ki) 



i lo §2(^) 7 >70 

0 else 



(9.60) 



where 70 = — ln ^^ C1 ^ is a cutoff fade depth below which the channel is not used. As for the previous bound, it is 
the rate positivity constraint that determines the cutoff fade depth 70 . The optimal power adaptation as a function 
of 70 is 



Sir/) f fr T > 7 o 

S \ 0 else 



(9.61) 



where Kq = 

u C2 



The value of 70 is determined from the average power constraint to satisfy 



— f°p(7)d7=l- 

70 J 70 



(9.62) 



Thus, for all three P b approximations in MPSK, the optimal adaptive rate schemes (9.55), (9.57), and (9.60) 
have the same form while the optimal adaptive power schemes (9.54), (9.58), and (9.61) have different forms. The 
optimal power adaptations (9.54) (9.58) (9.61) are plotted in Figure 9.15 for Rayleigh fading with a target BER of 
1CT 3 and 7 = 30 dB. This figure clearly shows the water-filling, inverse water-filling, and on-off behavior of the 
different schemes. Note that the cutoff 70 for all these schemes is roughly the same. We also see from this figure 
that even though the power adaptation schemes arc different at low SNRs, they arc almost the same at high SNRs. 
Specifically we see that for 7 < 10 dB, the optimal transmit power adaptations arc dramatically different, while 
for 7 > 10 dB they rapidly converge to the same constant value. From the cumulative density function of 7 also 
shown in Figure 9.15, the probability that 7 is less than 10 is 0.01. Thus, although the optimal power adaptation 
corresponding to low SNRs is very different for the different techniques, this behavior has little impact on spectral 
efficiency since the probability of being at those low SNRs is quite small. 



9.4.2 Discrete Rate Adaptation 

We now assume a given discrete set of constellations At = {Mo = 0, . . . , Mjv-i}, where Mq corresponds to 
no data transmission. The rate corresponding to each of these constellations is kj = log 2 Mj,j = 0, . . . , A — 1, 
where ko = 0. Each rate kj,j > 0 is assigned to a fading region of 7 values Rj = [ 7 j_ 1 , 77 ) , 7 = 0 ..... A — 1 , 
for 7 _i = 0 and 7 at_i = 00 . The boundaries 7 j,j = 0. .... A — 2 arc optimized as pail of the adaptive policy. 
The channel is not used for 7 < 70 . We again assume that P b is approximated using the general formula (9.43). 
Then the power adaptation that maintains the target BER above the cutoff 70 is 



Sj 7) = hjkj) 
S 7 



7j-i < 7 < Ij, 



(9.63) 
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Figure 9.15: Power Adaptation for MPSK BER Bounds (Rayleigh fading, Pb = 10 3 , 7 = 30 dB). 



where 



h{kj) = - 



ln(P b /ci) 



C2 



^2 cak i 



c 4 . 



(9.64) 



The region boundaries 70 , ... , 7 jv -2 that maximize spectral efficiency arc found using the Lagrange equation 



N-l 



Hi 



7i> lN-2) = / p{'y)d'y + A 

3 = 1 



Hi 



N-l 

E 

3 = 1 ^-1 



h(kj 
7 



-p(l)dj ~ 1 



The optimal rate region boundaries arc obtained by solving the following equation for 7 j. 

dJ 



dlj 



= 0, 0 < j < N - 2. 



This yields 



and 



7o = 



MAh) 



h(k j+ r) - 

7i = r. T. P> 



kj + 1 kj 

where p is determined by the average power constraint 

N-l 



1 <i< N-2, 



h(kj] 



T 

Uhi - 1 7 



p( 7 )d 7 = 1. 



(9.65) 



(9.66) 

(9.67) 

(9.68) 



(9.69) 



9.4.3 Average BER Target 

Suppose now that we relax our assumption that the P b target must be met on every symbol transmission, and 
instead require just the average Pb be below some target average P b . In this case, in addition to adapting rate and 
power, we can also adapt the instantaneous P b ( 7) subject to the average constraint P b . This gives an additional 
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degree of freedom in adaptation that may lead to higher spectral efficiencies. We define the average probability of 
error for adaptive modulation as 



— E[number of bits in error per transmission] 
h E[number of bits per transmission] 

When the bit rate k( 7) is continuously adapted this becomes 



p = /o°° p b ( 7 ) k (tMt) ^7 
/ 0 °° Ki)p{i) d i 



and when k( 7) takes values in a discrete set this becomes 



- _ kj Pb(l)p(l)d^ 

E^Ti 1 kj p(7)^7 



(9.70) 



(9.71) 



(9.72) 



We now derive the optimal continuous rate, power, and BER adaptation to maximize spectral efficiency 
E\k( 7 )] subject to an average power constraint S and the average BER constraint (9.71). As with the instan- 
taneous BER constraint, this is a standard constrained optimization problem, which we solve using the Lagrange 
method. We now require two Lagrangians for the two constraints: average power and average BER. Specifically, 
the Lagrange equation is 



J(H 7 ), 5(7)) = 




Ki)p(i) d l + 



Ai 




Pb{i)Ki)p{i)di - 



+A2 




S(n/)p(l)d'y - S 




Ki)p(ri)di 



The optimal rate and power adaptation must satisfy 



dJ 
dk{ 7) 



and 



dJ 

M 7 ) ’ 



with the additional constraint that k{ 7 ) and S( 7 ) arc nonnegative for all 7 . 
Assume that If, is approximated using the general formula (9.43). Define 

f{Ki)) = 2 C3fe(T) - C4- 



(9.73) 



(9.74) 



(9.75) 



Then using (9.43) in (9.73) and solving (9.74) we get that the power and BER adaptation that maximize spectral 
efficiency satisfy 



Sir/) 

s 



= max 



/ 0 ( 7)) 

df(k{ 7)) 
dk( 7) 



A 2 5(AiP, - 1 ) 



/(fe(7)) S 

df(k( 7)) 



w-Wr k ^ 



-,0 



(9.76) 



for nonnegative k( 7 ) and 



n( 7) 



A 2 5/(A;(7)) 

Aic 2 7fc(7) 



(9.77) 
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Moreover, from (9.43), (9.76), and (9.77) we get that the optimal rate adaptation k( 7 ) is either zero or the nonneg- 
ative solution of 



At Pb ~ 1 /(A)(7)) _ 1 f A1C1 027^(7) 

4$$^ ~ 7C2 n [ \ 2 Sf(K 7 )) . ' 



(9.78) 



The values of k( 7 ) and the Lagrangians Ai and A 2 must be found through a numerical search such that the average 
power constraint S and average BER constraint (9.71) are satisfied. 

In the discrete rate case, the rate is varied within a fixed set k 0 , . . . , /r/v-i where ko corresponds to no data 
transmission. We must determine region boundaries 70 , • ■ • , 7jv- 2 suc h that we assign rate kj to the rate region 
[ 7 j_i, 7 j), where we set 7 _i = 0 and 'Jn-i = 00 . Under this rate assignment we wish to maximize spectral 
efficiency through optimal rate, power, and BER adaptation subject to an average power and BER constraint. 
Since the set of possible rates and their corresponding rate region assignments arc fixed, the optimal rate adaptation 
corresponds to finding the optimal rate region boundaries 7 j,j = 0, ... , N—2. The Lagrangian for this constrained 
optimization problem is 



<^( 70 , 7r, •••) 7tv-2i *S , (7)) 

N-l 



” " Hi 

2_ j k i / ph) d i 

7=1 1 



+ 



3 = 
N-l 



Ar 



m 



k i / ( P b(T) - Pb)p{l)dl 



+ A 2 



5(7)p(7)d7 - S 



(9.79) 



j = 1 | L-V70 

The optimal power adaptation is obtained by solving the following equation for S( 7 ): 

-7L = o. 

dS( 7 ) 

Similarly, the optimal rate region boundaries arc obtained by solving the following set of equations for 77: 

dJ 






= 0, 0 < j < N - 2. 



From (9.80) we see that the optimal power and BER adaptation must satisfy 

dP b { 7 ) — A 2 



05(7) ~ kj Xi' 7j - lS7S7 ^ 

Substituting (9.43) into (9.82) we get that 






7 kj ’ 



7j-t < 7 < 7 j 



(9.80) 



(9.81) 



(9.82) 



(9.83) 



where A = P¥-. This form of BER adaptation is similar - to the waterfilling power adaptation: the instantaneous 



BER decreases as the channel quality improves. Now setting the BER in (9.43) equal to (9.83) and solving for 
5 ( 7 ) yields 

S(l) = 7j-t<7<7j (9-84) 



where 



Sj( 7) 



-=r-- = In 

5 



A f(kj 

cijkj 



f_M 

- 7 C 2 






(9.85) 
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and S( 7 ) = 0 for 7 < 70 . We see from (9.85) that S( 7 ) is discontinuous at the 7 j boundaries. 

Let us now consider the optimal region boundaries 70 , ... , 7 n- 2 ■ Solving (9.81) for Pb('ij) yields 

n(7i) = Pb - V Sj+1 ^ j) - S / lj) , 0 <j<iV- 2 , (9.86) 



Ai Ai 



kj + \ kj 



where fco = 0 and .S'o ( 7 ) = 0. Unfortunately, this set of equations is very difficult to solve for the optimal boundary 
points { 7 ?}. However, if we assume that S( 7 ) is continuous at each boundary then we get that 



p b(lj) = p b - j, 0<j<N-2, 

for some constant A. Under this assumption we can solve for the suboptimal rate region boundaries as 



7j-i = 



m 

k 



P, l<j<N-l, 



(9.87) 



(9.88) 



'3 



for some constant p. The constants A and p arc found numerically such that the average power constraint 

£ P ^P(7)rf7 = 1 

j = 1 ■'Tj-i * 



(9.89) 



and BER constraint (9.72) arc satisfied. Note that the region boundaries (9.88) arc suboptimal since S( 7 ) is 
not necessarily continuous at the boundary regions, and therefore these boundaries yield a suboptimal spectral 
efficiency. 

In Figure 9.16 we plot average spectral efficiency for adaptive MQAM under both continuous and discrete 
rate adaptation, and both average and instantaneous BER targets for a Rayleigh fading channel. The adaptive 
policies arc based on the BER approximation (9.7) with a target BER of either 10 “ 3 or 10“'. For the discrete 
rate cases we assume that 6 different MQAM signal constellations arc available (7 fading regions) given by M. = 
{0, 4, 16, 64, 256, 1024, 4096}. We see in this figure that the spectral efficiencies of all four policies under the 
same instantnaeous or average BER target arc very close to each other. For discrete -rate adaptation, the spectral 
efficiency with an instantaneous BER target is slightly higher than under an average BER target even though the 
latter case is more constrained: that is because the efficiency under an average BER target is calculated with 
suboptimal rate region boundaries, which leads to a slight efficiency degradation. 



9.5 Adaptive Techniques in Combined Fast and Slow Fading 

In this section we examine adaptive techniques for composite fading channels consisting of both fast and slow 
fading (shadowing). We assume the fast fading changes too quickly to accurately measure and feed back to the 
transmitter, so the transmitter only adapts to the slow fading. The instantaneous SNR 7 has distribution p( 7 ) 7 ) 
where 7 is a short-term average over the fast fading. This short-term average varies slowly due to shadowing and 
has a distribution p{ 7 ) where the average SNR relative to this distribution is 7 . The transmitter adapts only to 
the slow fading 7 , hence its rate k( 7 ) and power S( 7 ) arc functions of 7 . The power adaptation is subject to a 
long-term average power constraint over both the fast and slow fading: 

S(^j)p(pf)dq = S. (9.90) 
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Figure 9.16: Spectral Efficiency for Different Adaptation Constraints. 

As above, we approximate the instantaneous probability of bit error by the general form (9.43). Since the 
power and rate arc functions of 7 , the conditional BER, conditioned on 7 , is 



Pb{ 7 1 7) ~ ci exp 



S{ 7 ) ■ 
C27 

2 c 3 fc (7) - c 4 



(9.91) 



Since the transmitter does not adapt to the fast fading 7 , we cannot require a given instantaneous BER. However, 
since the transmitter adapts to the shadowing, we can require a target average probability of bit error averaged over 
the fast fading for a fixed value of the shadowing. This short term average for a given 7 is obtained by averaging 
Pb( 7 I 7 ) over the fast fading distribution p( 7 I 7 ): 



p b( 7) = / ^(7l7)p(7l7)d7- 

Jo 

Using (9.91) in (9.92) and assuming Rayleigh fading for the fast fading, this becomes 



Pb( 7 ) = =/ ci exp 
7 Jo 



S( 7 ) 
-C2 r r~j 1 



7 



2 c 3 fc( 7 ) - c 4 7 



d'y = 



ci 



C2'yS(n)/S 



2 C 3 fc(-r )_ C4 



+ 1 



For example, with MQAM modulation with the tight BER bound (9.7), (9.93) becomes 

Pb{l) = 2 



ijTgm is 

2 fc (T) — 1 



+ 1 



(9.92) 



(9.93) 



(9.94) 



We can now invert (9.93) to obtain the adaptive rate k( 7 ) as a function of the target average BER Pf, and the 
power adaptation S( 7 ) as 

Kl) = ^ log 2 (c 4 + , (9.95) 
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where 



K = -3 



(9.96) 



Cl/Pb ~ 1 

only depends on the target average BER and decreases as this target decreases. We maximize spectral efficiency 
by maximizing 

log 2 c 1 



r°° i 

mm\ = / - : 

Jo c 3 



C 4 + 

v s 



P{ l)dl 



(9.97) 



subject to the average power constraint (9.90). 

Let us assume that C 4 > 0. Then this maximization and the power constraint arc in the exact same form as 
(9.11) with the fading 7 replaced by the slow fading 7 . Thus, the optimal power adaptation also has the same 
waterfilling form as (9.13) and is given by 



S(l) 

5 



T 0 ~W 7>C47o /K 



0 



7 < c^q/K 



(9.98) 



where the channel is not used when 7 < c^q/K. The value of 7 0 is determined by the average power constraint. 
Substituting (9.98) into (9.95) yields the rate adaptation 



Kl) = — !og 2 i K lho) 

C 3 

and the corresponding average spectral efficiency is given by 



R 

B 



f°° (Kj\ 

/ log 2 — ) p{rf)dj. 
' Ci'yo/K V 70 



(9.99) 



(9.100) 



Thus we see that in a composite fading channel where rate and power are only adapted to the slow fading, for 
C 4 > 0 in (9.43), water-filling relative to the slow fading is the optimal power adaptation to maximize spectral 
efficiency subject to an average BER constraint. 

Our derivation has assumed that the fast fading is Rayleigh, but it can be shown that with C 4 > 0 in (9.43), 
the optimal power and rate adaptation for any fast fading distribution have the same water-tilling form [35]. Since 
we have assumed C 4 > 0 in (9.43), the positivity constraint on power dictates the cutoff value below which the 
channel is not used. As we saw in Section 9.4.1, when C 4 < 0 the positivity constraint on rate dictates this cutoff, 
and the optimal power adaptation becomes inverse-waterfilling for C 4 < 0 and on-off power adaptation for C 4 = 0 . 
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Chapter 9 Problems 



1. Find the average SNR required to achieve an average BER of P b = 10 3 for 8PSK modulation in Rayleigh 
fading. What is the spectral efficiency of this scheme assuming a symbol time T s = 1 / B. 

2. Consider a truncated channel inversion variable-power technique for Rayleigh fading with average SNR of 
20 dB. What value of cr corresponds to an outage probability of .1? What is the maximum size MQAM 
constellation that can be transmitted under this policy so that in nonoutage, P b ~ 10 -3 ?. 

3. Find the power adaptation for QPSK modulation that maintains a fixed P b = 10~ 3 in nonoutage for a 
Rayleigh fading channel with 7 = 20 dB. What is the outage probability of this system? 

4. Consider variable-rate MQAM modulation scheme with just two constellations, M = 4 and M = 16. 
Assume a target P b of approximately 10~ 3 . If the target cannot be met then no data is transmitted. 

(a) Using the BER bound (9.7) find the range of 7 values associated with the three possible transmission 
schemes (no transmission, 4QAM, and 16QAM) where the BER target is met. What is the cutoff 70 
below which the channel is not used. 

(b) Assuming Rayleigh fading with 7 = 20 dB, find the average data rate of the variable-rate scheme. 

(c) Suppose that instead of suspending transmission below 70, BPSK is transmitted for 0 < 7 < 70. Using 
the loose bound (9.6), find the average probability of error for this BPSK transmission. 

5. Consider an adaptive modulation and coding scheme consisting of 3 modulations: BPSK, QPSK, and 8PSK, 
along with 3 block codes of rate 1/2, 1/3, and 1/4. Assume the first code provides roughly 3 dB of coding 
gain for each modulation type, the second code provides 4 dB, and the third code provides 5 dB. For each 
possible value of SNR 0 < 7 < 00, find the combined coding and modulation with the maximum data rate 
for a target BER of HU 3 (you can use any reasonable approximation for modulation BER in this calculation, 
with SNR increased by the coding gain). What is the average data rate of the system for a Rayleigh fading 
channel with average SNR of 20 dB, assuming no transmission if the target BER cannot be met with any 
combination of modulation and coding. 

6. Show that the spectral efficiency given by (9.11) with power constraint (9.12) is maximized by the water- 
tilling power adaptation (9.13) by setting up the Lagrangian equation, differentiating it, and solving for the 
maximizing power adaptation. Also show that with this power adaptation, the rate adaptation is as given in 
(9.16) 

7. In this problem we compare the spectral efficiency of nonadaptive techniques with that of adaptive tech- 
niques. 

(a) Using the tight BER bound for MQAM modulation given by (9.7), find an expression for the average 
probability of bit error in Rayleigh fading as a function of M and 7. 

(b) Based on the expression found in paid (a), find the maximum constellation size that can be transmitted 
over a Rayleigh fading channel with a target average BER of 10“ 3 , assuming 7 = 20 dB. 

(c) Compare the spectral efficiency of paid (b) with that of adaptive modulation shown in Figure 9.3 for 
the same parameters. What is the spectral efficiency difference between the adaptive and nonadaptive 
techniques. 

8. Consider a Rayleigh fading channel with average SNR of 20 dB. Assume a target BER of 10" 4 . 
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(a) Find the optimal rate and power adaptation for variable-rate variable-power MQAM, including the 
cutoff value 70 / K below which the channel is not used. 

(b) Find the average spectral efficiency for the adaptive scheme derived in part (a). 

(c) Compare your answer in paid (b) to the spectral efficiency of truncated channel inversion, where 70 is 
chosen to maximize this efficiency. 

9. Consider a discrete time-varying AWGN channel with four channel states. Assuming a fixed transmit power 
S, the received SNR associated with each channel state is 71 = 5 db, 72 = 10 db, 73 = 15 dB, and 74 = 20 
dB, respectively. The probabilities associated with the channel states are p( 71 ) = .4 and p( 72 ) = 7 ) ( 73 ) = 
79 ( 74 ) = .2. 

(a) Find the optimal power and rate adaptation for continous-rate adaptive MQAM on this channel. 

(b) Find the average spectral efficiency with this optimal adaptation. 

(c) Find the truncated channel inversion power control policy for this channel and the maximum data rate 
that can be supported with this policy. 

10. Consider a Rayleigh fading channel with an average received SNR of 20 dB and a required BER of 10 
Find the spectral efficiency of this channel using truncated channel inversion, assuming the constellation is 
restricted to size 0, 2, 4, 16, 64, or 256. 

11. Consider a Rayleigh fading channel with an average received SNR of 20 dB, a Doppler of 80 Hz, and a 
required BER of 10 3 . 

(a) Suppose you use adaptive MQAM modulation on this channel with constellations restricted to size 0, 
2, 4, 16, and 64. Using 7 ^- = .1 find the fading regions Rj associated with each of these constellations. 
Also find the average spectral efficiency of this restricted adaptive modulation scheme and the average 
time spent in each region Rj. If the symbol rate is T s = B~ 1 over approximately how many symbols 
is each constellation transmitted before a change in constellation size is needed? 

(b) Find the exact BER of your adaptive scheme using (9.33). How does it differ from the target BER? 

12. Consider a Rayleigh fading channel with an average received SNR of 20 dB, a signal bandwidth of 30 KHz, 
a Doppler of 80 Hz, and a required BER of 10 

(a) Suppose the estimation error e = 7/7 in a variable-rate variable-power MQAM system with a target 
BER of 10 -3 is uniformly distributed between .5 and 1.5. Find the resulting average probability of bit 
error for this system. 

(b) Find an expression for the average probability of error in a variable-rate variable -power MQAM system 
where the SNR estimate 7 available at the transmitter is both a delayed and noisy estimate of 7 : 7 (t) = 
-y(t — t) + 7 e (f). What joint distribution is needed to compute this average? 

13. Consider an adaptive trellis-coded MQAM system with a coding gain of 3 dB. Assume a Rayleigh fading 
channel with an average received SNR of 20 dB. Find the optimal adaptive power and rate policy for this 
system and the corresponding average spectral efficiency. 

14. In Chapter 6 a bound on F), for nonrectangular MQAM was given as F), ss |og l| A/ Q Find 

values for < 7 , C 2 , C 3 , and C 4 for the general BER form (9.43) to approximate this bound with M = 8 . Any 
curve-approximation technique is acceptable. Plot both BER formulas for 0 < 7 < 30 dB. 
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15. Show that the average spectral efficiency E[k( 7 )] for k( 7 ) given by (9.44) with power constraint S is maxi- 
mized by the power adaptation (9.47). 

16. In this problem we investigate the optimal adaptive modulation for MPSK modulation based on the three 
BER bounds (9.51), (9.52), and (9.53). We assume a Rayleigh fading channel so that 7 is exponentially 
distributed with 7 = 30 dB and a target BER of I), = 10 ' . 

(a) The cutoff fade depth 70 must satisfy 

r P (i)d7 < 1 

J'yo/K V70 7 KJ 

for K given by (9. 10). Find the cutoff value 70 corresponding to the power adaptation for each of the 
three bounds. 

(b) Plot S( 7 ) / S and k( 7 ) as a function of 7 for Bounds 1 , 2, and 3 for 7 ranging from 0 to 30 dB. Also state 
whether the cutoff value below which the channel is not used is based on the power or rate positivity 
constraint. 

(c) How does the power adaptation associated with the different bounds differ at low SNRs? How about 
at high SNRs. 

17. Show that for general M- ary modulation, the power adaptation that maintains a target instantaneous BER is 
given by (9.63). Also show that the region boundaries that maximize spectral efficiency, obtained using the 
Lagrangin given in (9.65), are given by (9.67) and (9.68). 

18. Show that for general Af-ary modulation with an average target BER, the Lagrangian (9.80) implies that 
the optimal power and BER adaptation must satisfy (9.82). Then show how (9.82) leads to BER adaptation 
given by (9.83), which in turn leads to the power adaptation given by (9.84)-(9.85). Finally, use (9.81) to 
show that the optimal rate region boundaries must satisfy (9.86). 

19. Consider adaptive MPSK where the constellation is restricted to either no transmission or M = 2, 4, 8, 16. 
Assume the probability of error is approximated using (9.51). Find and plot the optimal discrete-rate and 
power adaptation for 0 < 7 < 30 dB assuming a Rayleigh fading channel with 7 = 20 dB and a target F), 
of 10 -4 . What is the resulting average spectral efficiency? 

20. We assume the same discrete -rate adaptive MPSK as in the previous problem, except now there is an average 
target of 10 -4 instead of an instantaneous target. Find the optimal discrete -rate and power adaptation for 
a Rayleigh fading channel with 7 = 20 dB and the corresponding average spectral efficiency. 

21. Consider a composite fading channel with fast Rayleigh fading and slow log-normal shadowing with an 
average dB SNR Ptp dB = 20 dB (averaged over both fast and slow fading) and cr^ dB = 8 dB. Assume an 
adaptive MPSK modulation that adapts only to the shadowing, with a target average BER of 10 “ 3 . Using 
the BER approximation (9.51) find the optimal power and rate adaptation policies as a function of the slow 
fading 7 that maximize average spectral efficiency while meeting the average BER target. Also determine 
the average spectral efficiency that results from these policies. 

22. In this chapter we determined the optimal adaptive rate and power policies to maximize average spectral 
efficiency while meeting a target average BER in combined Rayleigh fading and shadowing. The derivation 
assumed the general bound (9.43) with c 4 > 0. For the same composite channel, find the optimal adaptive 
rate and power policies to maximize average spectral efficiency while meeting a target average BER assum- 
ing C 4 < 0 Hint: the derivation is similar to the case of continuous ratre adaptation using the second MPSK 
bound and results in the same channel inversion power control. 
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23. As in the previous problem, we again examine the adaptative rate and power policies to maximize average 
spectral efficiency while meeting a target average BER in combined Rayleigh fading and shadowing. In 
this problem we assume the general bound (9.43) with C 4 = 0. For the composite channel, find the optimal 
adaptive rate and power policies to maximize average spectral efficiency while meeting a target average BER 
assuming C 4 = 0 Hint: the derivation is similar to that of Section 9.4.1 for the third MPSK bound and results 
in the same on-off power control. 
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Chapter 10 

Multiple Antennas and Space-Time 
Communications 



In this chapter we consider systems with multiple antennas at the transmitter and receiver, which arc commonly 
referred to as multiple input multiple output (MIMO) systems. The multiple antennas can be used to increase 
data rates through multiplexing or to improve performance through diversity. We have already seen diversity in 
Chapter 7. In MIMO systems the transmit and receive antennas can both be used for diversity gain. Multiplexing 
is obtained by exploiting the structure of the channel gain matrix to obtain independent signalling paths that can 
be used to send independent data. Indeed, the initial excitement about MIMO was sparked by the pioneering 
work of Winters [1], Foschini [2], Gans [3], and Telatar [4] [5] predicting remarkable spectral efficiencies for 
wireless systems with multiple transmit and receive antennas. These spectral efficiency gains often require accurate 
knowledge of the channel at the receiver, and sometimes at the transmitter as well. In addition to spectral efficiency 
gains, ISI and interference from other users can be reduced using smart antenna techniques. The cost of the 
performance enhancements obtained through MIMO techniques is the added cost of deploying multiple antennas, 
the space and power requirements of these extra antennas (especially on small handheld units), and the added 
complexity required for multi-dimensional signal processing. In this chapter we examine these different uses for 
multiple antennas and find their performance advantages. The mathematics in this chapter uses several key results 
from matrix theory: Appendix C provides a brief overview of these results. 

10.1 Narrowband MIMO Model 

In this section we consider a narrowband MIMO channel. A narrowband point-to-point communication system of 
M t transmit and M r receive antennas is shown in Figure 10.1 This system can be represented by the following 
discrete time model: 
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or simply as y = Hx + n. Here x represents the M t - -dimensional transmitted symbol, n is the M r -dimensional 
noise vector, and H is the M r x Alt matrix of channel gains h l3 representing the gain from transmit antenna j 
to receive antenna i. We assume a channel bandwidth of B and complex Gaussian noise with zero mean and 
covariance matrix afj.M r , where typically = NqB. For simplicity, given a transmit power constraint P we will 
assume an equivalent model with a noise power of unity and transmit power Pjo\ = p, where p can be interpreted 
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Figure 10.1: MIMO Systems. 



as the average SNR per receive antenna under unity channel gain. This power constraint implies that the input 
symbols satisfy 

M t 

[xix*] = p, (10.1) 

i = 1 

or, equivalently, that Tr(R x ) = p, where Tr(R x ) is the trace of the input covariance matrix R x = i?[xx T ]. 

Different assumptions can be made about the knowledge of the channel gain matrix H at the transmitter 
and receiver, referred to as channel side information at the transmitter (CSIT) and channel side information at 
the receiver (CSIR), respectively. For a static channel CSIR is typically assumed, since the channel gains can be 
obtained fairly easily by sending a pilot sequence for channel estimation. More details on estimation techniques 
for MIMO channels can be found in [10, Chapter 3.9]. If a feedback path is available then CSIR from the receiver 
can be sent back to the transmitter to provide CSIT: CSIT may also be available in time-division duplexing systems 
without a feedback path by exploiting reciprocal properties of propagation. When the channel is not known at either 
the transmitter or receiver then some distribution on the channel gain matrix must be assumed. The most common 
model for this distribution is a zero-mean spatially white (ZMSW) model, where the entries of H are assumed 
to be i.i.d. zero mean, unit variance, complex circularly symmetric Gaussian random variables 1 . We adopt this 
model unless stated otherwise. Alternatively, these entries may be complex circularly symmetric Gaussian random 
variables with a non-zero mean or with a covariance matrix not equal to the identity matrix. In general, different 
assumptions about CSI and about the distribution of the H entries lead to different channel capacities and different 
approaches to space-time signalling. 

Optimal decoding of the received signal requires ML demodulation. If the symbols modulated onto each of 
the Mt transmit antennas arc chosen from an alphabet of size \X\, then because of the cross-coupling between 
transmitted symbols at the receiver antennas, ML demodulation requires an exhaustive search over all \X\ Mt 
possible input vector of Mt symbols. For general channel matrices, when the transmitter does not know H this 
complexity cannot be reduced further. This decoding complexity is typically prohibitive for even a small number of 
transmit antennas. However, decoding complexity is significantly reduced if the channel is known at the transmitter, 

1 A complex Gaussian vector x is circularly symmetric if 



F[(x-£[x])((x-E[x]) h ] = .5 
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for some Hermitian non-negative definite matrix Q 
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as shown in Section 10.2. 



10.2 Parallel Decomposition of the MIMO Channel 

We have seen in Chapter 7 that multiple antennas at the transmitter or receiver can be used for diversity gain. When 
both the transmitter and receiver have multiple antennas, there is another mechanism for performance gain called 
multiplexing gain. The multiplexing gain of a MIMO system results from the fact that a MIMO channel can 
be decomposed into a number R of parallel independent channels. By multiplexing independent data onto these 
independent channels, we get an /Mold increase in data rate in comparison to a system with just one antenna at 
the transmitter and receiver. This increased data rate is called the multiplexing gain. In this section we describe 
how to obtain independent channels from a MIMO system. 

Consider a MIMO channel with M r x M t channel gain matrix H known to both the transmitter and the 
receiver. Let R/f denote the rank of H. From matrix theory, for any matrix H we can obtain its singular value 
decomposition (SVD) as 

H = UXV H , (10.2) 

where the M r x M r matrix U and the M t x M t matrix V arc unitary matrices 2 and X is an M r x M t diagonal matrix 
of singular values { a r } of H. These singular values have the property that rr t = \/Xj for A j the 7th eigenvalue of 
HH h , and Rh of these singular values are nonzero, where R // is the rank of the matrix H. Since Rh cannot 
exceed the number of columns or rows of H, II u < min (M t , M r ). If H is full rank, which is sometimes referred 
to as a rich scattering environment, then II n = min (M t , M r ). Other environments may lead to a low rank H: a 
channel with high correlation among the gains in H may have rank 1. 

The parallel decomposition of the channel is obtained by defining a transformation on the channel input and 
output x and y through transmit precoding and receiver shaping. In transmit precoding the input to the antennas 
x is generated through a linear transformation on input vector x as x = V H x. Receiver shaping performs a similar 
operation at the receiver by multiplying the channel output y with U H , as shown in Figure 10.2. 




xx y y 



Figure 10.2: Transmit Precoding and Receiver Shaping. 

The transmit precoding and receiver shaping transform the MIMO channel into R // parallel single-input 
single-output (SISO) channels with input x and output y, since from the SVD, we have that 

y = U ff (Hx + n) 

= U H (UXVx + n) 

= U 5 (USVV^x + n) 

= U^USVV^ + U^n 

= Ex + n, 

where n = U^n and X is the diagonal matrix of singular values of H with a, on the 7th diagonal. Note that 
multiplication by a unitary matrix does not change the distribution of the noise, i.e. n and n arc identically 

2 U and V unitary imply UU H = Im,. and V H V = I Mt • 
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distributed. Thus, the transmit precoding and receiver shaping transform the MIMO channel into R // parallel 
independent channels where the ith channel has input x t , output y t , noise hi, and channel gain rr t . Note that the 
rr z s arc related since they arc all functions of H, but since the resulting parallel channels do not interfere with each 
other, we say that the channels with these gains arc independent, linked only through the total power constraint. 
This parallel decomposition is shown in Figure 10.3. Since the parallel channels do not interfere with each other, 
the optimal ML demodulation complexity is linear in R jj, the number of independent paths that need to be decoded. 
Moreover, by sending independent data across each of the parallel channels, the MIMO channel can support R // 
times the data rate of a system with just one transmit and receive antenna, leading to a multiplexing gain of R //. 
Note, however, that the performance on each of the channels will depend on its gain a t . The next section will more 
precisely characterize the multiplexing gain associated with the Shannon capacity of the MIMO channel. 




Figure 10.3: Parallel Decomposition of the MIMO Channel. 



Example 10.1: Find the equivalent parallel channel model for a MIMO channel with channel gain matrix 

' .1 .3 .7 ' 

H = .5 .4 .1 (10.3) 

_ .2 .6 .8 _ 

Solution: The SVD of H is given by 

' -0.555 .3764 -.7418 1 [ 1.3333 0 0 1 [ -.2811 -.7713 -.5710 ' 

H = -.3338 -.9176 -.2158 0 .5129 0 -.5679 -.3459 .7469 . (10.4) 

_ -.7619 0.1278 .6349 J |_ 0 0 .0965 J |_ -.7736 .5342 -.3408 _ 

Thus, since there are 3 nonzero singular values, Rh = 3, leading to three parallel channels, with channel gains 
(j i = 1.3333, and (T-> = .5129, and rr : > = .0965, respectively. Note that the channels have diminishing gain, with 
a very small gain on the third channel. Hence, this last channel will either have a high error probability or a low 
capacity. 
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10.3 MIMO Channel Capacity 



This section focuses on the Shannon capacity of a MIMO channel, which equals the maximum data rate that can be 
transmitted over the channel with arbitrarily small error probability. Capacity versus outage defines the maximum 
rate that can be transmitted over the channel with some nonzero outage probability. Channel capacity depends on 
what is known about the channel gain matrix or its distribution at the transmitter and/or receiver. Throughout this 
section it is assumed that the receiver has knowledge of the channel matrix H, since for static channels a good 
estimate of H can be obtained fairly easily. First the static channel capacity will be given, which forms the basis 
for the subsequent section on capacity of fading channels. 

10.3.1 Static Channels 

The capacity of a MIMO channel is an extension of the mutual information formula for a SISO channel given 
by (4.3) in Chapter 4 to a matrix channel. Specifically, the capacity is given in terms of the mutual information 
between the channel input vector x and output vector y as 

C = max J(X; Y) = max [H(Y) - iJ(Y|X)] , (10.5) 

p(x) p(x) 

for H( Y) and // ( Y | X ) the entropy in y and y|x, as defined in Chapter 4.1 3 . The definition of entropy yields that 
iT(Y|X) = H( N), the entropy in the noise. Since this noise n has fixed entropy independent of the channel input, 
maximizing mutual information is equivalent to maximizing the entropy in y. 

The mutual information of y depends on its covariance matrix, which for the narrowband MIMO model is 
given by 

R y = E[yy H ] = HR x H fl + l Mr , (10.6) 

where R x is the covariance of the MIMO channel input. It turns out that for all random vectors with a given 
covariance matrix R y , the entropy of y is maximized when y is a zero-mean circularly- symmetric complex Gaus- 
sian (ZMCSCG) random vector [5]. But y is only ZMCSCG if the input x is ZMCSCG, and therefore this is the 
optimal distribution on x. This yields H(y) = B log 2 det[7reR y ] and // (n) = B log 2 det[7relM r ]> resulting in 
the mutual information 

/(X; Y ) = B log 2 det [l Mr + HR x H h ] . (10.7) 

This formula was derived in [3, 5] for the mutual information of a multiantenna system, and also appeared in earlier 
works on MIMO systems [6, 7] and matrix models for ISI channels [8, 9]. 

The MIMO capacity is achieved by maximizing the mutual information (10.7) over all input covariance ma- 
trices R x satisfying the power constraint: 

C= max B log 2 det \1m t + HR x H h 1 , (10.8) 

R x :Tr(R x )=p 

where det [A] denotes the determinant of the matrix A. Clearly the optimization relative to R x will depend on 
whether or not H is known at the transmitter. We now consider this maximizing under different assumptions about 
transmitter CSI. 

Channel Known at Transmitter: Waterfilling 

The MIMO decomposition described in Section 10.2 allows a simple characterization of the MIMO channel ca- 
pacity for a fixed channel matrix H known at the transmitter and receiver. Specifically, the capacity equals the sum 

’Entropy was defined in Chapter 4.1 for scalar random variables, but the definition is identical for random vectors 
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of capacities on each of the independent parallel channels with the transmit power optimally allocated between 
these channels. This optimization of transmit power across the independent channels results from optimizing the 
input covariance matrix to maximize the capacity formula (10.8). Substituting the matrix SVD (10.2) into (10.8) 
and using properties of unitary matrices we get the MIMO capacity with CSIT and CSIR as 



C = max V B log 2 (l + a/p/) . 

pi-T,iPi<p i 



(10.9) 



Since p = P/a f, the capacity (10.9) can also be expressed in terms of the power allocation P, to the 7 th parallel 
channel as 

c = «,£«< p ? Blog2 ( 1 + ^) = p ,“ R <^ Blog2 ( 1 + ^) <10 ' 10) 

where p t = Pi /a f and py = a/P /of is the SNR associated with the ith channel at full power. This capacity for- 
mula is the same as in the case of flat fading (4.9) or in frequency-selective fading (4.23). Solving the optimization 
leads to a water-filling power allocation for the MIMO channel: 



P l = { ^~T< 7i - 70 
P 1 0 7i < 70 

for some cutoff value 70. The resulting capacity is then 

C= £ #log(7i/7o). 

*:7i>70 



( 10 . 11 ) 



( 10 . 12 ) 



Example 10.2: Find the capacity and optimal power allocation for the MIMO channel given in the previous ex- 
ample, assuming p = P/a f = 10 dB and B = 1 Hz. 



Solution: From the previous example, the singular values of the channel arc a 1 = 1.3333, er 2 = 0.5129, and 
<73 = 0.0965. Since 7$ = lOof , this yields 71 = 17.77, 72 = 2.63, and 73 = .087. Assuming that power is 
allocated to all three parallel channels, the power constraint yields 




12.974. 



Solving for 70 yields 70 = .231, which is inconsistent since 73 = .087 < 70 = .231. Thus, the third channel is 
not allocated any power. Then the power constraint yields 

(— - — ) = 1 — = 1 + V — = 1.436. 

V7o 7 i) 7o 7» 




Solving for 70 for this case yields 70 = 1.392 < 72, so this is the correct cutoff value. Then l\ = 1/1.392 — 1/7*, 
so Pi = .662 and P 2 = .338. The capacity is given by C = log 2 (71/70) + log 2 (72/70) = 4.59. 



Capacity under perfect CSIT and CSIR can also be defined on channels where there is a single antenna at 
the transmitter and multiple receive antennas (single-input multiple-output or SIMO) or multiple transmit antennas 
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and a single receive antenna (multiple-input single-output or MISO). These channels can only obtain diversity gain 
from the multiple antennas. When both transmitter and receiver know the channel the capacity equals that of a 
SISO channel with the signal transmitted or received over the multiple antennas coherently combined to maximize 
the channel SNR, as in MRC. This results in capacity C = B log 2 (l + /die), with the channel matrix H reduced 
to a vector h of channel gains, the optimal weight vector c = h*/||h||, and p = P/c t\. 



Channel Unknown at Transmitter: Uniform Power Allocation 



Suppose now that the receiver knows the channel but the transmitter does not. Without channel information, the 
transmitter cannot optimize its power allocation or input covariance structure across antennas. If the distribution 
of H follows the ZMSW channel gain model, there is no bias in terms of the mean or covariance of H. Thus, it 
seems intuitive that the best strategy should be to allocate equal power to each transmit antenna, resulting in an 
input covariance matrix equal to the scaled identity matrix: R x = It is shown in [4] that under these 

assumptions this input covariance matrix indeed maximizes the mutual information of the channel. For an M t - 
transmit, Af r -receive antenna system, this yields mutual information given by 

/ = Rlog 2 det[I Mr + ^-HH H ]. 

Using the SVD of H, we can express this as 

Rh / \ 

where 7 \ = of p = afP/cr^ and Ilu is the number of nonzero singular values of H. 

The mutual information of the MIMO channel (10.13) depends on the specific realization of the matrix H, in 
particular its singular values { 07 }. The average mutual information of a random matrix H, averaged over the matrix 
distribution, depends on the probability distribution of the singular values of H [5, 13, 11]. In fading channels the 
transmitter can transmit at a rate equal to this average mutual information and insure correct reception of the data, 
as discussed in the next section. But for a static channel, if the transmitter does not know the channel realization 
or, more precisely, the channel’s average mutual information then it does not know at what rate to transmit such 
that the data will be received correctly. In this case the appropriate capacity definition is capacity with outage. In 
capacity with outage the transmitter fixes a transmission rate C, and the outage probability associated with C is the 
probability that the transmitted data will not be received correctly or, equivalently, the probability that the channel 
H has mutual information less than C. This probability is given by 



Pout = p 



: B log 2 det 



•-M, 



+ -^-nn H 

M t 




(10.14) 



As the number of trans mi t and receive antennas grows large, random matrix theory provides a central limit 
theorem for the distribution of the singular values of H [14], resulting in a constant mutual information for all 
channel realizations. These results were applied to obtain MIMO channel capacity with uncorrelated fading in 
[15, 39, 17, 18] and with correlated fading in [19, 20, 12]. As an example of this limiting distribution, note that for 
fixed M r , under the ZMSW model the law of large numbers implies that 

Jim = l Mr . (10.15) 

M(— »oo Alf 



Substituting this into (10.13) yields that the mutual information in the asymptotic limit of large M t becomes a 
constant equal to C = M r B log 2 (l + p). Defining M = mm(A7), M r ), this implies that as M grows large, the 
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MIMO channel capacity in the absence of CSIT approaches C = MB log 2 (l + p), and hence grows linearly in M . 
Moreover, this linear growth of capacity with M in the asymptotic limit of large M is observed even for a small 
number of antennas [20]. Similarly, as SNR grows large, capacity also grows linearly with M = min (M t , M r ) 
for any M t and M r [2]. These results arc the main reason for the widespread appeal of MIMO techniques: even 
if the channel realization is not known at the transmitter, the capacity of MIMO channels still grows linearly with 
the minimum number of transmit and receiver antennas, as long as the channel can be accurately estimated at 
the receiver. Thus, MIMO channels can provide very high data rates without requiring increased signal power or 
bandwidth. Note, however, that at very low SNRs transmit antennas arc not beneficial: capacity only scales with 
the number of receive antennas indepedent of the number of transmit antennas. The reason is that at these low 
SNRs, the MIMO system is just trying to collect energy rather than exploit all available dimensions, so all energy 
is concentrated into one of the available transmit antenna to achieve capacity [4]. 

While lack of CSIT does not affect the growth rate of capacity relative to M, at least for a large number of 
antennas, it does complicate demodulation. Specifically, without CSIT the transmission scheme cannot convert 
the MIMO channel into non-interfering SISO channels. Recall that the decoding complexity is exponential in the 
number of independent symbols transmitted over the multiple transmit antennas, and this number equals the rank 
of the input covariance matrix. 

The above analysis under perfect CSIR and no CSIT assumes that the channel gain matrix has a ZMSW 
distribution, i.e. it has mean zero and covariance matrix equal to the identity matrix. When the channel has nonzero 
mean or a non-identity covariance matrix, there is a spatial bias in the channel that should be exploited by the 
optimal transmission strategy, so equal power allocation across antennas is no longer optimal [23, 24, 25]. Results 
in [25, 26] indicate that when the channel has a dominant mean or covariance direction, beamforming, described 
in Section 10.4, achieves channel capacity. This is a fortuitous situation due to the simplicity of beamforming. 

10.3.2 Fading Channels 

Suppose now that the channel gain matrix experiences flat-fading, so the gains h tJ vary with time. As in the case of 
the static channel, the capacity depends on what is known about the channel matrix at the transmitter and receiver. 
With perfect CSIR and CSIT the transmitter can adapt to the channel fading and its capacity equals the average 
over all channel matrix realizations with optimal power allocation. With CSIR and no CSIT outage capacity is used 
to characterize the outage probability associated with any given channel rate. These different characterizations arc 
described in more detail in the following sections. 

Channel Known at Transmitter: Water-Filling 

With CSIT and CSIR the transmitter optimizes its transmission strategy for each fading channel realization as 
in the case of a static channel. The capacity is then just the average of capacities associated with each channel 
realization, given by (10.8), with power optimally allocated. This average capacity is called the ergodic capacity of 
the channel. There arc two possibilities for allocating power under ergodic capacity. A short-term power constraint 
assumes that the power associated with each channel realization must equal the average power constraint P. In 
this case the ergodic capacity becomes 



C = E h 



max B logo det [iM r + hr x h h 1 
R x :Tr(R x )=p 



= E h 



Pj: ^< P E slo M 1 + 



Pill 

p 



. (10.16) 



A less restrictive constraint is a long-term power constraint, where we can use different powers for different channel 
realizations subject to the average power constraint over all channel realizations. The ergodic capacity under this 
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assumption is given by 



C 



max Eh 
Ph'-E[ph]=P 



max B log 2 det [Im f + HR x H h 1 
R x :Tr(Rx)=p ff 



(10.17) 



The short-term power constraint gives rise to a water-filling in space across the antennas, whereas the long-term 
power constraint allows for a two-dimensional water-filling across both space and time, similar to the frequency- 
time water-filling associated with the capacity of a time- varying frequency-selective fading channel. 

Channel Unknown at Transmitter: Ergodic Capacity and Capacity with Outage 

Consider now a time-varying channel with random matrix H known at the receiver but not the transmitter. The 
transmitter assumes a ZMSW distribution for H. The two relevant capacity definitions in this case are ergodic 
capacity and capacity with outage. Ergodic capacity defines the maximum rate, averaged over all channel real- 
izations, that can be transmitted over the channel for a transmission strategy based only on the distribution of H. 
This leads to the transmitter optimization problem - i.e., finding the optimum input covariance matrix to maxi- 
mize ergodic capacity subject to the transmit power constraint. Mathematically, the problem is to characterize the 
optimum R x to maximize 



C = max Eh [T>log 2 det [I m t +HR x H h ]] , (10.18) 

R x :Tr(R x )=p 



where the expectation is with respect to the distribution on the channel matrix H, which for the ZMSW model is 
i.i.d. zero-mean circularly symmetric unit variance. 

As in the case of scalar channels, the optimum input covariance matrix that maximizes ergodic capacity for 
the ZMSW model is the scaled identity matrix R x = M t , i.e. the transmit power is divided equally among all 
the transmit antennas and independent symbols are sent over the different antennas. Thus the ergodic capacity is 
given by: 



C 



Eh 



B log 2 det 



•■M, 



+ ^HH h 

M t 



(10.19) 



Since the capacity of the static channel grows as M = min(M T . Mr) for M large, this will also be true of the 
ergodic capacity since it just averages the static channel capacity. Expressions for the growth rate constant can be 
found in [4] [27]. When the channel is not ZMSW, capacity depends on the distribution of the singular values for 
the random channel matrix: these distributions and the resulting ergodic capacity in this more general setting arc 
studied in in [13]. 

The ergodic capacity of a 4 x 4 MIMO system with i.i.d. complex Gaussian channel gains is shown in 
Figure 10.4. This figure shows capacity with both transmitter and receiver CSI and with receiver CSI only. There 
is little difference between the two, and this difference decreases with SNR, which is also the case for a SISO 
channel. Comparing the capacity of this channel to that of a SISO fading channel shown in Figure 4.7, we see 
that the MIMO ergodic capacity is 4 times larger than the SISO ergodic capacity, which is as expected since 
min(M(, M r ) = 4. 

When the channel gain matrix is unknown at the transmitter and the entries arc complex Gaussian but not 
i.i.d. then the channel mean or covariance matrix can be used at the transmitter to increase capacity. The basic idea 
is to allocate power according to the mean or covariance. This channel model is sometimes referred to as mean 
or covariance feedback. This model assumes perfect receiver CSI, and the impact of correlated fading depends on 
what is known at the transmitter: if the transmitter knows the channel realization or doesn’t know the realization 
or the correlation structure than antenna correlation decreases capacity relative to i.i.d. fading. However, if the 
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Figure 10.4: Ergodic Capacity of 4 x 4 MIMO Channel. 



transmitter knows the correlation structure than capacity is increased relative to i.i.d. fading. Details on capacity 
under these different conditions can be found in [28, 25, 26]. 

Capacity with outage is defined si mi lar to the definition for static channels described in Section 10.3.1, al- 
though now capacity with outage applies to a slowly-varying channel where the channel matrix H is constant 
over a relatively long transmission time, then changes to a new value. As in the static channel case, the channel 
realization and corresponding channel capacity is not known at the transmitter, yet the transmitter must still fix a 
transmission rate to send data over the channel. For any choice of this rate C, there will be an outage probability 
associated with C, which defines the probability that the transmitted data will not be received correctly. The outage 
probability is the same as in the static case, given by (10.14). The outage capacity can sometimes be improved 
by not allocating power to one or more of the transmit antennas, especially when the outage probability is high. 
[4]. This is because outage capacity depends on the tail of the probability distribution. With fewer antennas, less 
averaging takes place and the spread of the tail increases. 

The capacity with outage of a 4 x 4 MIMO system with i.i.d. complex Gaussian channel gains is shown in 
Figure 10.5 for outage of 1% and 10%. We see that the difference in outage capacity for these two outage proba- 
bilities increases with SNR. This can be explained from the distribution curves for capacity shown in Figure 10.6. 
These curves show that at low SNRs, the distribution is very steep, so that the capacity with outage at 1% is very 
close to that at 10% outage. At higher SNRs the curves become less steep, leading to more of a capacity difference 
at different outage probabilities. 

No CSI at the Transmitter or Receiver 

When there is no CSI at either the transmitter or receiver, the linear growth in capacity as a function of the num- 
ber of transmit and receive antennas disappears, and in some cases adding additional antennas provides negligible 
capacity gain. Moreover, channel capacity becomes heavily dependent on the underlying channel model, which 
makes it difficult to make generalizations about capacity growth. For an i.i.d. block fading channel it is shown 
in [33] that increasing the number of transmit antennas by more than the duration of the block does not increase 
capacity. Thus, there is no data rate increase beyond a certain number of transmit antennas. However, when fading 
is correlated, additional transmit antennas do increase capacity [29]. These results were extended in [34] to explic- 
itly characterize capacity and the capacity-achieving transmission strategy for this model in the high SNR regime. 
Similar results were obtained for a block-Markov fading model in [35]. However, a general analysis in [36] indi- 
cates that these results are highly dependent on the structure of the fading process; when this structure is removed 
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Figure 10.5: Capacity with Outage of a 4 x 4 MIMO Channel. 




Figure 10.6: Outage Probability Distribution of a 4 x 4 MIMO Channel. 



and a general fading process is considered, in the high SNR regime capacity grows only doubly logarithmically 
with SNR, and the number of antennas adds at most a constant factor to this growth term. In other words, there is 
no multiplexing gain associated with multiple antennas when there is no transmitter or receiver CSI. 

10.4 MIMO Diversity Gain: Beamforming 

The multiple antennas at the transmitter and receiver can be used to obtain diversity gain instead of capacity gain. 
In this setting, the same symbol, weighted by a complex scale factor, is sent over each transmit antenna, so that the 
input covariance matrix has unit rank. This scheme is also referred to as MIMO beamforming 4 . A beamforming 
strategy corresponds to the precoding and receiver matrices described in Section 10.2 being just column vectors: 
V = v and U = u, as shown in Figure 10.7. As indicated in the figure, the transmit symbol x is sent over the 7th 
antenna with weight Vi. On the receive side, the signal received on the 7th antenna is weighted by u Both transmit 

4 Unfortunately, beamforming is also used in the smart antenna context of Section 10.8 to describe adjustment of the transmit or receive 
antenna directivity in a given direction. 
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and receive weight vectors arc normalized so that | |it| | = ||n|| = 1. The resulting received signal is given by 

y = u*Hvx + u*n, (10.20) 

where if n = (ni, . . . , um t ) has i.i.d. elements, the statistics of u*n arc the same as the statistics for each of these 
elements. 




Figure 10.7: MIMO Channel with Beamforming. 

Beamforming provides diversity gain by coherent combining of the multiple signal paths. Channel knowledge 
at the receiver is typically assumed since this is required for coherent combining. The diversity gain then depends 
on whether or not the channel is known at the transmitter. When the channel matrix H is known, the received 
SNR is optimized by choosing u and v as the principal left and right singular vectors of the channel matrix H. 
The corresponding received SNR can be shown to equal 7 = A maxP, where \ max is the largest eigenvalue of 
the Wishart matrix W = HH fl [21], The resulting capacity is C = B log 2 (l + A m axP)-> corresponding to the 
capacity of a SISO channel with channel power gain A max . When the channel is not known at the transmitter, the 
transmit antenna weights arc all equal, so the received SNR equals 7 = | |Hu* 1 1, where u is chosen to maximize 7 . 
Clearly the lack of transmitter CSI will result in a lower SNR and capacity than with optimal transmit weighting. 
While beamforming has a reduced capacity relative to optimizing the transmit precoding and receiver shaping 
matrices, the optimal demodulation complexity with beamforming is of the order of \X\ instead of \X\' Rn . An 
even simpler strategy is to use MRC at either the transmitter or receiver and antenna selection on the other end: 
this was analyzed in [ 22 ], 



Example 10.3: Consider a MIMO channel with gain matrix 



H = 



.7 .9 .8 

.3 .8 .2 

.1 .3 .9 



Find the capacity of this channel under beamforming assuming channel knowledge at the transmitter and receiver, 
B = 100 KHz, and p = 10 dB. 



Solution The Wishart matrix for H is 



w = nn H 



1.94 


1.09 


1.06 


1.09 


.77 


.45 


1.06 


.45 


.91 
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and the largest eigenvalue of this matrix is X ma x = 3.17. Thus, C = B log 2 (l + X max p ) = 10 5 log 2 (l + 31.7) = 
503 Kbps. 



10.5 Diversity/Multiplexing Tradeoffs 



The previous sections suggest two mechanisms for utilizing multiple antennas to improve wireless system per- 
formance. One option is to obtain capacity gain by decomposing the MIMO channel into parallel channels and 
multiplexing different data streams onto these channels. This capacity gain is also referred to as a multiplexing 
gain. However, the SNR associated with each of these channels depends on the singular values of the channel ma- 
trix. In capacity analysis this is taken into account by assigning a relatively low rate to these channels. However, 
practical signaling strategies for these channels will typically have poor performance, unless powerful channel 
coding techniques are employed. Alternatively, beamforming can be used, where the channel gains are coherently 
combined to obtain a very robust channel with high diversity gain. It is not necessary to use the antennas purely 
for multiplexing or diversity. Some of the space-time dimensions can be used for diversity gain, and the remaining 
dimensions used for multiplexing gain. This gives rise to a fundamental design question in MIMO systems: should 
the antennas be used for diversity gain, multiplexing gain, or both? 

The diversity/multiplexing tradeoff or, more generally, the tradeoff between data rate, probability of error, and 
complexity for MIMO systems has been extensively studied in the literature, from both a theoretical perspective 
and in terms of practical space-time code designs [50, 37, 38, 42]. This work has primarily focused on block 
fading channels with receiver CSI only since when both transmitter and receiver know the channel the tradeoff is 
relatively straightforward: antenna subsets can first be grouped for diversity gain and then the multiplexing gain 
corresponds to the new channel with reduced dimension due to the grouping. For the block fading model with 
receiver CSI only, as the blocklength grows asymptotically large, full diversity gain and full multiplexing gain (in 
terms of capacity with outage) can be obtained simultaneously with reasonable complexity by encoding diagonally 
across antennas [51, 52, 2]. An example of this type of encoding is D-BLAST, described in Section 10.6.4. For 
finite blocklengths it is not possible to achieve full diversity and full multiplexing gain simultaneously, in which 
case there is a tradeoff between these gains. A simple characterization of this tradeoff is given in [37] for block 
fading channels with blocklength T > A// + M r — 1 in the limit of asymptotically high SNR. In this analysis a 
transmission scheme is said to achieve multiplexing gain r and diversity gain d if the data rate (bps) per unit Hertz 
i?(SNR) and probability of error P e (SNR) as functions of SNR satisfy 



lim 

iog 2 SNR-+00 



R( SNR) 
log 2 SNR 



= r, 



( 10 . 21 ) 



and 



lim 

log SNR — »oo 



log P e (SNR) 
log SNR 



-d, 



( 10 . 22 ) 



where the log in (10.22) can be in any base 5 . For each r the optimal diversity gain d op t{r ) is the maximum the 
diversity gain that can be achieved by any scheme. It is shown in [37] that if the fading blocklength exceeds the 
total number of antennas at the transmitter and receiver, then 



d op t(r) = (M t - r)(M r - r), 0 < r < min (M t , M r ). (10.23) 

s The base of the log cancels out of the expression since (10.22) is the ratio of two logs with the same base. 
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The function (10.23) is plotted in Fig. 10.8. Recall that in Chapter 7 we found that transmitter or receiver diversity 
with M antennas resulted in an error probability proportional to SNR~ M . The formula (10.23) implies that in a 
MIMO system, if we use all transmit and receive antennas for diversity, we get an error probability proportional to 
SNR~ MtMr and that, moreover, we can use some of these antennas to increase data rate at the expense of diversity 
gain. 




Multiplexing Gain r=R/log(SNR) 



Figure 10.8: Diversity-Multiplexing Tradeoff for High SNR Block Fading. 

It is also possible to adapt the diversity and multiplexing gains relative to channel conditions. Specifically, 
in poor channel states more antennas can be used for diversity gain, whereas in good states more antennas can be 
used for multiplexing. Adaptive techniques that change antenna use to trade off diversity and multiplexing based 
on channel conditions have been investigated in [39, 40, 41], 



Example 10.4: Let the multiplexing and diversity parameters r and d he as defined in (10.21) and (10.22). Suppose 
that r and d approximately satisfy the diversity /multiplexing tradeoff d op t(r ) = (M t — r)(M r — r) at any large 
finite SNR. For an M t = M r = 8 MIMO system with an SNR of 15 dB, if we require a data rate per unit Hertz of 
R = 15 bps, what is the maximum diversity gain the system can provide? 

Solution: With SNR=15 dB, to get R =15 we require rlog 2 (10 1 ' 5 ) = 15 which implies r = 3.01. Thus, three 
of the antennas are used for multiplexing and the remaining five for diversity. The maximum diversity gain is then 

d op t{r ) = (M t - r)(M r - r) = (8 - 3) (8 - 3) = 25. 



10.6 Space-Time Modulation and Coding 

Since a MIMO channel has input-output relationship y = Hx + n, the symbol transmitted over the channel each 
symbol time is a vector rather than a scalar, as in traditional modulation for the SISO channel. Moreover, when 
the signal design extends over both space (via the multiple antennas) and time (via multiple symbol times), it is 
typically referred to as a space-time code. 

Most space-time codes, including all codes discussed in this section, arc designed for quasi-static channels 
where the channel is constant over a block of T symbol times, and the channel is assumed unknown at the trans- 
mitter. Under this model the channel input and output become matrices, with dimensions corresponding to space 
(antennas) and time. Let X = [xi, . . . , xy] denote the M t x T channel input matrix with 2th column x, equal to the 
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vector channel input over the / th transmission time. Let Y = [y i, . . . , yj] denote the M r x T channel output ma- 
trix with ith column y , equal to the vector channel output over the / 1 h transmission time, and let N = [n \ , . . . , ny] 
denote the M r x T noise matrix with ith column n, equal to the receiver noise vector on the ith transmission time. 
With this matrix representation the input-output relationship over all T blocks becomes 

Y = HX + N. (10.24) 

10.6.1 ML Detection and Pairwise Error Probability 

Assume a space-time code where the receiver has knowledge of the channel matrix H. Under ML detection it can 
be shown using similar techniques as in the scalar (Chapter 5) or vector (Chapter 8) case that given received matrix 
Y, the ML transmit matrix X satisfies 



T 

X = arg min I Y — HXl I p = arg min I ly* — Hxdlp, (10.25) 

X e *M t xT M " 

1=1 

where 1 1 A\ |f denotes the Frobenius norm 6 of the matrix A and the minimization is taken over all possible space- 
time input matrices X J . The pairwise error probability for mistaking a transmit matrix X for another matrix X, 
denoted as p(X — ► X), depends only on the distance between the two matrices after transmission through the 
channel and the noise power, i.e. 



P(X - X) = Q 



(»)■ 



(10.26) 



Let Dx = X — X denote the difference matrix between X and X. Applying the Chernoff bound to (10.26) yields 



p(X — ► X) < exp 



( l|HDxll| \ 

( )■ 



Let li, denote the ith row of H, i = 1, . . . , M r . Then 



M r 

HDxll^^h.UxUfhf. 

i= 1 



(10.27) 



(10.28) 



Let TL = vec (H T ) T where vec(A) is defined as the vector that results from stacking the columns of matrix A on 
top of each other to form a vector 7 . So TL 1 is a vector of length M r M t . Also define V x = Im t < 8 > Dx, where (g> 
denotes the Kroncckcr product. With these definitions, 

||HD x ||f = \\V%'H h 'HV x \\f- (10-29) 



Substituting (10.29) into (10.27) and taking the expectation relative to all possible channel realizations yields 



P(X - X) < 



( 1 

ydet [i MtMr + e (pf n H nvx) 



M r 



(10.30) 



f ’The Frobenious norm of a matrix is the square root of the sum of the square of its elements. 

7 So for the M x N matrix A = [ai, . . . , a*], where a i is a vector of length M, vec(A) = [af , . . . , aJ)] T is a vector of length MN. 
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Suppose that the channel matrix H is random and spatially white, so that its entries arc i.i.d. zero-mean unit 
variance complex Gaussian random variables. Then taking the expectation yields 



P(X - X) < 



det [I Mt A 



Mr 



where A = D^D^. This simplifies to 



(10.31) 



n a 



P(X - X) < If 



k= 1 



1 



l + 7 A fc (A)/(4M t ) 



M r 



(10.32) 



where 7 = E s /a \ is the SNR per input symbol x or, equivalently, 7 /Mt is the SNR per antenna and A /,. ( A ) is 
the kth nonzero eigenvalue of A, k = 1, . . . , N&, where N& is the rank of A. In the high SNR regime, i.e. for 
7 >> 1 , this simplifies to 



P(X - X) < 



(n£, >*(*■ 



M r 



7 —N A M r 



4 M t 



(10.33) 



This equation gives rise to the main criteria for design of space-time codes, described in the next section. 



10.6.2 Rank and Determinant Criterion 

The pairwise error probability in (10.33) indicates that the probability of error decreases as 7 “ d for d = N±M r . 
Thus, N M r is the diversity gain of the space-time code. The maximum diversity gain possible through coherent 
combining of M t transmit and M r receive antennas is M t M r . Thus, to obtain this maximum diversity gain, the 
space-time code must be designed such that the M t x M t difference matrix A between any two code words has 
full rank equal to Mt. This design criterion is referred to as the rank criterion. 

The coding gain associated with the pairwise error probability in (10.33) depends on the first term (n£, a S (a 
T hus, a high coding gain is achieved by maximizing the minimum of the determinant of A over all input matrix 
pairs X and X. This criterion is referred to as the determinant criterion. 

The rank and determinant criteria were first developed in [43, 50, 44]. These criteria arc based on the pairwise 
error probability associated with different transmit signal matrices, rather than the binary domain of traditional 
codes, and hence often require computer searches to find good codes [45, 46]. A general binary rank criteria was 
developed in [47] to provide a better construction method for space-time codes. 



10.6.3 Space-Time Trellis and Block Codes 

The rank and determinant criteria have been primarily applied to the design of space-time trellis codes (STTCs). 
STTCs arc an extension of conventional trellis codes to MIMO systems [10, 44], They arc described using a trellis 
and decoded using ML sequence estimation via the Viterbi algorithm. STTCs can extract excellent diversity and 
coding gain, but the complexity of decoding increases exponentially with the diversity level and transmission rate 
[48]. Space-time block codes (STBCs) arc an alternative space-time code that can also extract excellent diversity 
and coding gain with linear receiver complexity. Interest in STBCs were initiated by the Alamouti code described in 
Section 7.3.2, which obtains full diversity order with linear receiver processing for a two-antenna transmit system. 
This scheme was generalized in [49] to STBCs that achieve full diversity order with an arbitrary number of transmit 
antennas. However, while these codes achieve full diversity order, they do not provide coding gain, and thus have 
inferior performance to STTCs, which achieve both full diversity gain as well as coding gain. Added coding gain 
for both STTCs and STBCs can be achieved by concatenating these codes either in serial or in parallel with an 
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outer channel code to form a turbo code [29, 32]. The linear complexity of the STBC designs in [49] result from 
making the codes orthogonal along each dimension of the code matrix. A similar design premise is used in [53] 
to design unitary space-time modulation schemes for block fading channels when neither the transmitter nor the 
receiver have channel CSI. More comprehensive treatments of space-time coding can be found in [10, 54, 55, 48] 
and the references therein. 

10.6.4 Spatial Multiplexing and BLAST Architectures 

The basic premise of spatial multiplexing is to send M t independent symbols per symbol period using the dimen- 
sions of space and time. In order to get full diversity order an encoded bit stream must be transmitted over all M t 
transmit antennas. This can be done through a serial encoding, illustrated in Figure 10. 10. With serial encoding the 
bit stream is temporally encoded over the channel blocklength T, interleaved, and mapped to a constellation point, 
then demultiplexed onto the different antennas. If each codeword is sufficiently long, it can be transmitted over 
all M t transmit antennas and received by all M r receive antennas, resulting in full diversity gain. However, the 
codeword length T required to achieve this full diversity is MtM r , and decoding complexity grows exponentially 
with this codeword length. This high level of complexity makes serial encoding impractical. 



Data 

Stream 




Figure 10.9: Spatial Multiplexing with Serial Encoding. 

A simpler method to achieve spatial multiplexing, pioneered at Bell Laboratories as one of the Bell Labs Lay- 
ered Space Time (BLAST) architectures for MIMO channels [2], is parallel encoding, illustrated in Ligure 10.10. 
With parallel encoding the data stream is demultiplexed into M t independent streams. Each of the resulting sub- 
streams is passed through a SISO temporal encoder with blocklenth T, interleaved, mapped to a signal constellation 
point, and transmitted over its corresponding transmit antenna. This process can be considered to be the encoding 
of the serial data into a vertical vector, and hence is also referred to as vertical encoding or V-BLAST [56]. Vertical 
encoding can achieve at most a diversity order of M r , since each coded symbol is transmitted from one antenna and 
received by M r antennas. This system has a simple encoding complexity that is linear in the number of antennas. 
However, optimal decoding still requires joint detection of the codewords from each of the transmit antennas, since 
all transmitted symbols arc received by all the receive antennas. It was shown in [57] that the receiver complexity 
can be significantly reduced through the use of symbol interference cancellation, as shown in Ligure 10.11. The 
symbol interference cancellation, which exploits the synchronicity of the symbols transmitted from each antenna, 
works as follows. Lirst the M t transmitted symbols are ordered in terms of their received SNR. An estimate of the 
received symbol with the highest SNR is made while treating all other symbols as noise. This estimated symbol 
is subtracted out, and the symbol with the next highest SNR estimated while treating the remaining symbols as 
noise. This process repeats until all M t transmitted symbols have been estimated. After cancelling out interfering 
symbols, the coded substream associated with each transmit antenna can be individually decoded, resulting in a 



315 






receiver complexity that is linear in the number of transmit antennas. In fact, coding is not even needed with 
this architecture, and data rates of 20-40 bps/Hz with reasonable error rates were reported in [56] using uncoded 
V-BLAST. 



Data 

Stream 




Figure 10.10: Spatial Multiplexing with Parallel Encoding: VBLAST. 




Figure 10. 1 1 : VBLAST Receiver with Linear Complexity. 



The simplicity of parallel encoding and the diversity benefits of serial encoding can be obtained using a cre- 
ative combination of the two techniques called diagonal encoding or D-BLAST [2], illustrated in Figure 10.12. In 
D-BLAST, the data stream is first horizontally encoded. However, rather than transmitting the independent code- 
words on separate antennas, the codeword symbols arc rotated across antennas, so that a codeword is spread over 
all M t antennas. The operation of the stream rotation is shown in Figure 10.13. Suppose the ith encoder generates 
the codeword x, = xn, , XiM t • The stream rotator transmits each coded symbol on a different antenna, so x,i 
is sent on antenna 1, x >:2 is sent on antenna 2, and so forth. If the code blocklength T exceeds M t then the rotation 
begins again on the 1st atnenna. As a result, the codeword is spread across all spatial dimensions. Transmission 
schemes based on D-BLAST can achieve the full M t M r diversity gain if the temporal coding with stream rota- 
tion is capacity-achieving (Gaussian code books with infinite block size T) [10, Chapter 6.3.5]. Moreover, the 
D-BLAST system can achieve the maximum capacity with outage if the wasted space-time dimensions along the 
diagonals are neglected [10, Chapter 12.4.1]. Receiver complexity is also linear in the number of transmit anten- 
nas, since the receiver decodes each diagonal code independently. However, this simplicity comes as a price, as 
the efficiency loss of the wasted space-time dimensions illustrated in Figure 10. 12 can be large if the frame size is 
not appropriately chosen. 
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Figure 10.12: Diagonal Encoding with Stream Rotation. 
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Figure 10.13: Stream Rotation. 



10.7 Frequency-Selective MIMO Channels 

When the MIMO channel bandwidth is large relative to the channel’s multipath delay spread, the channel suffers 
from ISI, similar to the case of SISO channels. There are two approaches to dealing with ISI in MIMO channels. 
A channel equalizer can be used to mitigate the effects of ISI. However, the equalizer is much more complex in 
MIMO channels since the channel must be equalized over both space and time. Moreover, when the equalizer is 
used in conjuction with a space-time code, the nonlinear and noncausal nature of the code further complicates the 
equalizer design. In some cases the structure of the code can be used to convert the MIMO equalization problem 
to a SISO problem for which well-established SISO equalizer designs can be used [58, 59, 60]. 

An alternative to equalization in frequency-selective fading is multicarrier modulation or orthogonal fre- 
quency division multiplexing (OFDM). OFDM techniques for SISO channels arc described in Chapter 12: the 
main premise is to convert the wideband channel into a set of narrowband subchannels that only exhibit flat- 
fading. Applying OFDM to MIMO channels results in a set of narrowband MIMO channels, and the space-time 
modulation and coding techniques described above for a single MIMO channel arc applied to the parallel set. 
MIMO frequency-selective fading channels exhibit diversity across space, time, and frequency, so ideally all three 
dimensions should fully exploited in the signaling scheme. 

10.8 Smart Antennas 

We have seen that multiple antennas at the transmitter and/or receiver can provide diversity gain as well as increased 
data rates through space-time signal processing. Alternatively, sectorization or phased array techniques can be used 
to provide directional antenna gain at the transmit or receive antenna array. This directionality can increase the 
signaling range, reduce delay-spread (ISI) and flat-fading, and suppress interference between users. In particular, 
interference typically arrives at the receiver from different directions. Thus, directional antennas can exploit these 
differences to null or attenuate interference arriving from given directions, thereby increasing system capacity. The 
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reflected multipath components of the transmitted signal also arrive at the receiver from different directions, and 
can also be attenuated, thereby reducing ISI and flat-fading. The benefits of directionality that can be obtained 
with multiple antennas must be weighed against then - potential diversity or multiplexing benefits, giving rise to a 
multiplexing/diversity/directionality tradeoff analysis. Whether it is best to use the multiple antennas to increase 
data rates through multiplexing, increase robustness to fading through diversity, or reduce ISI and interference 
through directionality is a complex tradeoff decision that depends on the overall system design. 

The most common directive antennas are sectorized or phased (directional) antenna arrays, and the gain pat- 
terns for these antennas along with an omnidirectional antenna gain pattern are shown in Figure 10.14. Sectorized 
antennas are designed to provide high gain across a range of signal arrival angles. Sectorization is commonly used 
at cellular system base stations to cut down on interference: if different sectors are assigned different frequencies 
or timeslots, then only users within a sector interfere with each other, thereby reducing the average interference 
by a factor equal to the number of sectors. For example, Figure 10.14 shows a sectorized antenna with a 120° 
beamwidths. A base station could divide its 360° angular range into three sectors to be covered by three 120° 
sectroized antennas, in which case the interference in each sector is reduced by a factor of 3 relative to an omnidi- 
rectional base station antenna. The price paid for reduced interference in cellular systems via sectorization is the 
need for handoff between sectors. 
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90 
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90 




Figure 10.14: Antenna Gains for Omnidirectional, Sectorized, and Directive Antennas. 

Directional antennas typically use antenna arrays coupled with phased array techniques to provide directional 
gain, which can be tightly contolled with sufficiently many antenna elements. Phased array techniques work by 
adapting the phase of each antenna element in the array, which changes the angular locations of the antenna beams 
(angles with large gain) and nulls (angles with small gain). For an antenna array with N antennas, N nulls can be 
formed to significantly reduce the received power of N separate interferers. If there are N j < N interferers, then 
the Ni interferers can be cancelled out using Nj antennas in a phased array, and the remaining N — Nj antennas 
can be used for diversity gain. Note that directional antennas must know the angular location of the desired and 
interfering signals to provide high or low gains in the appropriate directions. Tracking of user locations can be a 
significant impediment in highly mobile systems, which is why cellular base stations use sectorization instead of 
directional antennas. 

The complexity of antenna array processing along with the required real estate of an antenna array make the 
use of smart antennas in small, lightweight, low-power handheld devices unlikely in the near future. However base 
stations and access points already use antenna arrays in many cases. More details on the technology behind smart 
antennas and their use in wireless systems can be found in [61]. 
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Chapter 10 Problems 



1. Matrix identities arc commonly used in the analysis of MIMO channels. Prove the following matrix identi- 
ties. 



(a) Given an M x N matrix A show that the matrix AA n is Hermitian. What does this reveal about the 
eigendecomposition of AA H 1 

(b) Show that AA H is positive semidefinite. 

(c) Show that Im + AA H is Hermitian positive definite. 

(d) Show that det[Ijw + AA H ] = detflN + A H A], 

2. Find the SVD of the following matrix 



H = 



.7 .6 .2 .4 

.1 .5 .9 .2 

.3 .6 .9 .1 



3. Find a 3 x 3 channel matrix H with 2 nonzero singular 

4. Consider the 4 x 4 MIMO channels given below. What is the maximum multiplexing gain of each, i.e., how 
many independent scalar data streams can be supported reliably? 



' 1 


1 


-1 


1 


1 


1 


-1 


-1 


1 
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1 
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_ 1 


1 
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-1 
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1 


-1 
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1 


-1 


1 


1 


-1 


1 


1 


1 


-1 


-1 


-1 



5. The capacity of a static MIMO channel with only receiver CSI is given by C = Yhf=i 1°§2 (l + ■ Show 

that if the sum of singular values is bounded, this expression is maximized when all It u singular values are 
equal. 

6. Consider a MIMO system with the following channel matrix: 





' .1 


.3 .4 ' 




' -.5196 -.0252 


-.8541 ' 




' .9719 0 0 




' -.2406 


-.4727 


-.8477 


H = 


.3 


.2 .2 


= 


-.3460 -.9077 


.2372 




0 .2619 0 




-.8894 


-.2423 


.3876 




.1 


.3 .7 




-.7812 .4188 


.4629 




0 0 .0825 




.3886 


-.8472 


.3621 


Note that H is written in terms of its singular value decomposition (SVD) H = U AV. 









(a) Check if H = U AV . You will see that the matrices U, A, and V do not have sufficiently large precision 
so that U AV is only approximately equal to H. This indicates the sensitivity of the SVD, in particular 
the matrix A, to small errors in the estimate of the channel matrix //. 
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(b) Based on the singular value decomposition H = UAV, find an equivalent MIMO system consisting of 
three independent channels. Find the transmit precoding filter and the receiver shaping filter necessary 
to transform the original system into the equivalent system. 

(c) Find the optimal power allocation P{,i = 1,2,3 across the three channels found in paid (b), and 
the corresponding total capacity of the equivalent system, assuming P/a „ = 20 dB and the system 
bandwidth B = 100 KHz. 

(d) Compare the capacity in paid (c) to that when the channel is unknown at the transmitter, so equal power 
is allocated to each antenna. 



7. Show using properties of the SVD that for the MIMO channel known at the transmitter and receiver, the 
general capacity expression 

C = max B logo det [I m t + HRxH^l . 

R x :Tr(R x )=p 



reduces to 



C = max B log 2 (1 + \pi) , 

Pi-2ZiPi<P i 



for singular values { \/A7} and SNR p. 



8. For the 4x4 MIMO channels given below, find their capacity per unit Hz assuming both transmitter and 
receiver know the channel, for channel SNR p = 10 dB. 
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-1 
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-1 
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-1 




9. Assume a ZMCSCG MIMO system with channel matrix H corresponding to M t = M r = M transmit and 
receive antennas. Show using the law of large numbers that 



lim 

M— >oo 



— HH 

M 



H 



I M- 



Then use this to show that 

lim B log 2 det[I M + = MB\og 2 (l + p). 

M— > oo 1V1 



10. Plot the ergodic capacities per unit Hz for a ZMCSCG MIMO channel with SNR 0 < p < 30 dB for the 
following MIMO dimensions: 



(a) M t = M r = 1 

(b) M t = 2, M r = 1 

(c) M t = M r = 2 
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(d) M t = 2, M r = 3 

(e) M t = M r = 3 

Verify that at high SNRs, capacity grows linearly as M = min (M t , M r ). 

11. Plot the outage capacities per unit Hz for an outage probability of 1% for a ZMCSCG MIMO channel with 
SNR 0 < p < 30 dB for the following MIMO dimensions: 

(a) M t = M r = 1 

(b) M t = 2, M r = 1 

(c) M t = M r = 2 

(d) M t = 2, M r = 3 

(e) M t = M r = 3 

Verify that at high SNRs, capacity grows linearly as M = min (M t , M r ). 

12. Show that if the noise vector n = (m, . . . ,riM r ) has i.i.d. elements then, for | |rt|| = 1, the statistics of u*n 
arc the same as the statistics for each of these elements. 



13. Consider a MIMO system where the channel gain matrix H is known at the transmitter and receiver. Show 
that if transmit and receive antennas are used for diversity, the optimal weights at the transmitter and receiver 
lead to an SNR of 7 = A m axP, where p is the largest eigenvalue of hi H 11 . 

14. Consider a channel with channel matrix 



H = 



.1 .5 .9 

.3 .2 .6 

.1 .3 .7 



Assuming p = 10 dB, find the output SNR when beamforming is used on the channel with equal weights 
on each transmit antenna and optimal weighting at the receiver. Compare with the SNR under beamforming 
with optimal weights at both the transmitter and receiver. 



15. Consider an 8 x 4 MIMO system. Assume a coding scheme that can achieve the rate/diversity tradeoff 

d(r) = (Mt — r)(M r — r). 

(a) What is the maximum multiplexing rate for this channel given a required P e = p ~ d < 10 3 , assuming 
p = 10 dB? 

(b) Given the r in part (a), what is the resulting P e 2 

16. Find the capacity of a SIMO channel with channel gain vector h = [.1 .4 .75 .9], optimal receiver weighting, 
and p = 10 dB. 



17. Consider a 2x2 MIMO system with channel gain matrix H given by 



H = 



.3 .5 
.7 .2 



Assume H is known at both the transmitter and receiver, and that there is a total transmit power of P = 10 
mW across the two transmit antennas, AWGN with power Nq = 10~ 9 W/Hz at each receive antenna, and 
bandwidth B = 100 KHz. 
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(a) Find the SVD for H. 

(b) Find the capacity of this channel. 

(c) Assuming transmit precoding and receiver shaping is used to transform this channel into two parallel 

independent channels with a total power constraint P. Find the maximum data rate that can be transmit- 
ted over this parallel set assuming MQAM modulation on each channel with optimal power adaptation 
across the channels subject to power constraint P. Assume a target BER of 10 ~ 3 on each channel, the 
BER is bounded by < and the constellation size of the MQAM is unrestricted. 

(d) Suppose now that the antennas at the transmitter and receiver arc all used for diversity with optimal 
weighting at the transmitter and receiver to maximize the SNR of the combiner output. Find the SNR of 
the combiner output, and the BER of a BPSK modulated signal transmitted over this diversity system. 
Compare the data rate and BER of this BPSK signaling with diversity (assuming B = 1/TQ to the rate 
and BER from part (b). 

(e) Comment on the diversity/multiplexing tradeoffs between the systems in parts (b) and (c). 

18. Consider an M x M MIMO channel with ZMCSCG channel gains. 

(a) Plot the ergodic capacity per unit Hz of this channel for M = 1 and M = 4 with 0 < p < 20 dB 
assuming both transmitter and receiver have channel CSF 

(b) Repeat paid (a) assuming only the receiver has transmitter CSF 

19. Find the outage capacity for a 4 x 4 MIMO channel with ZMCSCG elements at 10% outage for p = 10 dB. 

20. Plot the CDF of capacity for a M x M MIMO channel with p = 10 dB assuming no transmitter knowledge 
for M = 4, 6, 8. What happens as M increases? What arc the implications of this behavior in a practical 
system design? 
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Chapter 11 

Equalization 



We have seen in Chapter 6 that delay spread causes intersymbol interference (ISI). ISI can cause an irreducible 
error floor when the modulation symbol time is on the same order as the channel delay spread. Signal processing 
provides a powerful mechanism to counteract ISI. In a broad sense, equalization defines any signal processing 
technique used at the receiver to alleviate the ISI problem caused by delay spread. Signal processing can also be 
used at the transmitter to make the signal less susceptible to delay spread: spread spectrum and multicarrier mod- 
ulation fall in this category of transmitter signal processing techniques. In this chapter we focus on equalization. 
Multicarrier modulation and spread spectrum arc the topics of Chapters 12 and 13, respectively. 

ISI mitigation is required when the modulation symbol time T s is on the order of the channel’s rms delay 
spread cr Tm . For example, cordless phones typically operate indoors, where the delay spread is small. Since voice 
is also a relatively low date rate application, equalization is generally not needed in cordless phones. However, the 
IS-54 digital cellular standard is designed for outdoor use, where ar m ~ T s , so equalization is paid of this standard. 
Higher data rate applications are more sensitive to delay spread, and generally require high-performance equalizers 
or other ISI mitigation techniques. In fact, mitigating the impact of delay spread is one of the most challenging 
hurdles for high-speed wireless data systems. 

Equalizer design must typically balance ISI mitigation with noise enhancement, since both the signal and 
the noise pass through the equalizer, which can increase the noise power. Nonlinear equalizers suffer less from 
noise enhancement than linear equalizers, but typically entail higher complexity, as discussed in more detail below. 
Moreover, equalizers must typically have an estimate of the channel impulse or frequency response to mitigate 
the resulting ISI. Since the wireless channel varies over time, the equalizer must learn the frequency or impulse 
response of the channel (training) and then update its estimate of the frequency response as the channel changes 
(tracking). The process of equalizer training and tracking is often referred to as adaptive equalization, since the 
equalizer adapts to the changing channel. Equalizer training and tracking can be quite difficult if the channel 
is changing rapidly. In this chapter we will discuss the various design issues associated with equalizer design, 
including balancing ISI mitigation with noise enhancement, linear and nonlinear equalizer design and properties, 
and the process of equalizer training and tracking. 

An equalizer can be implemented at baseband, RF, or IF. Most equalizers arc implemented digitally after 
A/D conversion, since such filters arc small, cheap, easily tuneable, and very power efficient. This chapter mainly 
focuses on digital equalizer implementations, although for simplicity noise enhancement will be illustrated in the 
next section with an analog equalizer. 
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11.1 Equalizer Noise Enhancement 



The goal of equalization is to mitigate the effects of 1ST However, this goal must be balanced so that in the process 
of removing 1ST the noise power in the received signal is not enhanced. A simple analog equalizer, shown in 
Figure 11.1, illustrates the pitfalls of removing ISI without considering this effect on noise. Consider a signal s(t) 
that is passed through a channel with frequency response H(f). At the receiver front end white Gaussian noise 
n(t) is added to the signal, so the signal input to the receiver is Y (/) = S(f)H(f) + N(f), where N(f) has power 
spectral density Nq. If the bandwidth of s(t) is B then the noise power within the signal bandwidth of interest 
is NoB. Suppose we wish to equalize the received signal so as to completely remove the ISI introduced by the 
channel. This is easily done by introducing an analog equalizer in the receiver defined by 

H eq (f) = 1/H(f). (11.1) 

The receiver signal Y(f) after passing through this equalizer becomes [S(f)H(f) + N(f)]H eq (f) = S(f) + 
N'(f), where N'(f) is colored Gaussian noise with power spectral density No/\H(f)\ 2 . Thus, all ISI has been 
removed from the transmitted signal S(f). 



n(t) 




Figure 11.1: Analog Equalizer Illustrating Noise Enhancement. 

However, if H(f) has a spectral null ( H(fo ) = 0 for some /o) at any frequency within the bandwidth of s(t), 
then the power of the noise N'(f) is infinite. Even without a spectral null, if some frequencies in H ( f ) arc greatly 
attenuated, the equalizer H eq (f) = 1/H(f) will greatly enhance the noise power at those frequencies. In this case 
even though the ISI effects arc removed, the equalized system will perform poorly due to its greatly reduced SNR. 
Thus, the true goal of equalization is to balance mitigation of the effects of ISI with maximizing the SNR of the 
post-equalization signal. Linear digital equalizers in general work by inverting the channel frequency response and 
therefore have the most noise enhancement. Nonlinear equalizers do not invert the channel frequency response, 
and thus tend to suffer much less from noise enhancement. In the next section we give an overview of the different 
types of lineal - and nonlinear equalizers, their structures, and the algorithms used for updating their tap coefficients 
in equalizer training and tracking. 



Example 11.1: Consider a channel with impulse response H(f) = 1/ t/\J\ for |/| < B, where B is the channel 
bandwidth. Given noise PSD Nq/2 , what is the noise power for channel bandwidth B = 30 KHz with and without 
a lineal' equalizer. 

Solution: Without equalization the noise power is just NqB = 3Nq x 10 4 . With equalization the noise PSD is 
N 0 \H eq (f)\ 2 = N 0 /\H(f)\ 2 = \f\N 0 ,\f\ < B. So the noise power is 

r B 

N 0 / fdf = NoB 2 = 9N 0 x 10 8 , 

J-B 

an increase in noise power of more than four orders of magnitude! ! 
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11.2 Equalizer Types 



Equalization techniques fall into two broad categories: linear and nonlinear. The linear techniques arc generally 
the simplest to implement and to understand conceptually. However, linear equalization techniques typically suffer 
from more noise enhancement than nonlinear equalizers, and arc therefore not used in most wireless applications. 
Among nonlinear equalization techniques, decision-feedback equalization (DFE) is the most common, since it is 
fairly simple to implement and generally performs well. However, on channels with low SNR, the DFE suffers 
from error propagation when bits arc decoded in error, leading to poor performance. The optimal equalization 
technique is maximum likelihood sequence estimation (MLSE). Unfortunately, the complexity of this technique 
grows exponentially with the length of the delay spread, and is therefore impractical on most channels of interest. 
However, the performance of the MLSE is often used as an upper bound on performance for other equalization 
techniques. Figure 11.2 summarizes the different equalizer types, along with their corresponding structures and 
tap updating algorithms, which are discussed in more detail in [1], 

Equalizers can also be categorized as symbol-by-symbol (SBS) or sequence estimators (SE). SBS equalizers 
remove ISI from each symbol and then detect each symbol individually. All linear equalizers in Figure 11.2 as 
well as the DFE are SBS equalizers. SEs detect sequences of symbols, so the effect of ISI is paid of the estimation 
process. Maximum likelihood sequence estimation (MLSE) is the optimal form of sequence detection, but is highly 
complex. 

Lineal - and nonlinear equalizers are typically implemented using a transversal or lattice structure. The transver- 
sal structure is a filter with N — 1 delay elements and N taps with tunable complex weights. The lattice filter uses 
a more complex recursive structure [2], In exchange for this increased complexity relative to transversal struc- 
tures, lattice structures often have better numerical stability and convergence properties and greater flexibility in 
changing their length [3], This chapter will focus on transversal structures: details on lattice structures and their 
performance relative to transversal structures can be found in [1, 2, 3, 4], 

In addition to the equalizer type and structure, adaptive equalizers require algorithms for updating the filter 
tap coefficients during training and tracking. Many algorithms have been developed over the years for this purpose. 
These algorithms generally entail tradeoffs between complexity, convergence rate, and numerical stability. 

In the remainder of this chapter, after discussing conditions for ISI-free transmission, we will discuss the 
different equalizer types, their structures, and their update algorithms in more detail. 

11.3 Folded Spectrum and ISI-Free Transmission 

Equalizers are typically implemented digitally. Figure 11.3 shows a block diagram of an end-to-end system with a 
digital equalizer. The input symbol dk is passed through a pulse shape filter g(t ) and then transmitted over the ISI 
channel with impulse response c(t). We define the equivalent channel impulse response hit) = g(t) * c(f), and the 
transmitted signal is thus given by d(t) * g(t) * c(t) for d(t) = J2k — kT s ) the train of information symbols. 
The pulse shape g(t ) improves the spectral properties of the transmitted signal, as described in Chapter 5.5. This 
pulse shape is under the control of the system designer, whereas the channel c(t) is introduced by nature and is 
outside the designer’s control. 

At the receiver front end white Gaussian noise n(t) is added to the received signal for a resulting signal w(t). 
This signal is passed through an analog matched filter g^—t) to obtain output y(t), which is then sampled via 
an A/D converter. The purpose of the matched filter is to maximize the SNR of the signal before sampling and 
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Figure 11.2: Equalizer Types, Structures, and Algorithms. 



subsequent processing 1 . Recall from Chapter 5. 1 that in AW GN the SNR of the received signal is maximized prior 
to sampling by using a filter that is matched to the pulse shape. This result indicates that for the system shown in 
Figure 1 1.3, SNR prior to sampling is maximized by passing w(t.) through a filter matched to h(t), so ideally we 
would have g m (t) = h(t). However, since the channel impulse response cit) is time-varying and analog filters are 
not easily tuneable, it is generally not possible to have g m {t) = hit). Thus, paid of the art of equalizer design is to 
chose g m (t) to get good performance. Often g m (t) is matched to the pulse shape g(t), which is the optimal pulse 
shape when c(f) = S(t), but this design is clearly suboptimal when c(f) / <5(t). The fact that g m (t) cannot be 
matched to hit) can result in significant performance degradation and also makes the receiver extremely sensitive 
to timing error. These problems are somewhat mitigated by sampling y(t ) at a rate much faster than the symbol rate 
and designing the equalizer for this oversampled signal. This process is called fractionally-spaced equalization 
[ 1 ]. 

The equalizer output then provides an estimate of the transmitted symbol. This estimate is then passed through 
a decision device that rounds the equalizer output to a symbol in the alphabet of possible transmitted symbols. 
During training the equalizer output is passed to the tap update algorithm to update the tap values such that the 
equalizer output matches the known training sequence. During tracking, the round-off error associated with the 
symbol decision is used to adjust the equalizer coefficients. 

Let /(f) denote the combined baseband impulse response of the transmitter, channel, and matched filter: 

f{t) = g(t)*c(t)*g* n (-t). (11.2) 

1 While the matched filter could be more efficiently implemented digitally, the analog implementation before the sampler allows for a 
smaller dynamic range in the sampler, which significantly reduces cost. 
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Then the matched filter output is given by 

y{t) = d(t) * /(f) + n g (t) = ^2d k f(t-kT)+n g (t), (11.3) 

where n g (t) = n(t) * g* n (—t) is the equivalent baseband noise at the equalizer input and T is the symbol time. 
If we let f[n\ = f{nT s ) denote samples of /(f) every T s seconds then sampling y(t ) every T s seconds yields the 
discrete time signal y[n\ = y(nT s ) given by 

OO 

y[n\ = E d k f{nT s - kT s ) + n g (nT s ) 

k=— oo 

OO 

= 22 dk ^ n _ + v \ n \ 

k=— oo 

= d n f[0] + 22 d kf[ n ~ k ] + I/ [ n b c 11 - 4 ) 

k^n 

where the first term in (1 1.4) is the desired data bit, the second term is the ISI, and the third term is the sampled 
baseband noise. We see from (11.4) that we get zero ISI if f[n — k\ = 0 for k / n, i.e. f[k] = 5[k]f[0}. In this 
case (11.4) reduces to y[n] = d n f[ 0] + v[n). 




Figure 11.3: End-to-End System. 

We now show that the condition for ISI-free transmission, /[/,:] = ()[/;•]/[()], is satisfied if and only if 

F *(f) = Y E + = (H-5) 

s n =— oo 

The function F^(f) is often called the folded spectrum, and F%(f) = /[0] implies that the folded spectrum is flat. 
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To show this equivalence, first note that 



/ OO 

F(f ) 

-OO 



e j2 ^fkTs d j 

— OO 

00 r .5(2n+l)/T s 



E 



F(l> 



,j2nfkT s 



df 



n —— 00 

00 /*.5/T, 



E 

n =— 00 
r-5/T s 

J-.5/T s 



,5(2n— 1)/T S 

_P ^ y' _|_ ZL j e j2n(f'+n/T s )kT sd ^ 



' —-5/Ta 



J2nfkT s 



E W/+ 



n 



df. 



(11.6) 



We first show that a flat folded spectrum implies that f[k\ = <5[fe]/[0]. Suppose (1 1.5) is true. Then by (1 1.6), 



f[k] = T f' 5/Ts e~ j2nfkTs f[0]T s df = ^/[0] = 6[k]f[ 0], 
J-.SlT a 



irk 



(11.7) 



which is the desired result. We now show that f[k\ = 5[A;]/[0] implies a flat folded spectrum. If f[k] = 5[A:]/[0] 
then by (1 1.6), 

r-5/T s 

f[k} = T s Fz(f)e^ kT >df. (11.8) 

J-.5 /T a 

So f[k] is the inverse Fourier transform of TV//). Therefore, if f[k\ = (5[/c]/[0], Fy,(f) = f[ 0]. 



Example 11.2: Consider a channel with combined baseband impulse response /(f) = sinc(f/T s ). Find the folded 
spectrum and determine if this channel exhibits 1ST 

Solution: The Fourier transform of /(f) is 



f T s I/I < -5/T s 

F(f) = T s rect(/T s ) = \ .5 T s \f\ = .5 /T s 

( 0 I/I > .5 /T s 



Thus, 



Fdf) 



1 OO 

y + 

s n =— 00 




= 1, 



so the folded spectrum is flat and there is no ISI. We can also see this from the fact that 

f 1 71 = n 

f(nT s ) = sine (nT s /T s ) = sinc(n) = | Q n q 
Thus, f[k\ = d[k], our equivalent condition for a flat folded spectrum and zero ISI. 
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11.4 Linear Equalizers 



If Fs(/) is not flat, we can use the equalizer H eq (z) in Fig. 1 1.3 to reduce ISI. In this section we assume a linear 
equalizer implemented via an N = 2L + 1 tap transversal filter: 

L 

H eq {z)=Y J w iZ~ i - (11.9) 

i=—L 

The length of the equalizer N is typically dictated by implementation considerations, since a large N usually entails 
higher complexity. Causal linear equalizers have w t = 0, i < 0. For a given equalizer size N the equalizer design 
must specify the tap weights L for a given channel frequency response, and the algorithm for updating 

these tap weights as the channel varies. Recall that our performance metric in wireless systems is probability 
of error (or outage probability), so for a given channel the optimal choice of equalizer coefficients would be the 
coefficients that minimize probability of error. Unfortunately it is extremely difficult to optimize the {u; t }s with 
respect to this criterion. Since we cannot directly optimize for our desired performance metric, we must instead 
use an indirect optimization that balances ISI mitigation with the prevention of noise enhancement, as discussed 
relative to the simple analog example above. We now describe two linear equalizers: the Zero Forcing (ZF) 
equalizer and the Minimum Mean Square Error (MMSE) equalizer. The former equalizer cancels all ISI, but can 
lead to considerable noise enhancement. The latter technique minimizes the expected mean squared error between 
the transmitted symbol and the symbol detected at the equalizer output, thereby providing a better balance between 
ISI mitigation and noise enhancement. Because of this more favorable balance, MMSE equalizers tend to have 
better BER performance than equalizers using the ZF algorithm. 



11.4.1 Zero Forcing (ZF) Equalizers 

From (1 1.4), the samples {y n } input to the equalizer can be represented based on the discretized combined system 
response /(f) = h(t) * g*(—t) as 

Y(z) = D(z)F(z) + N g (z), (11.10) 

where N g (z) is the power spectrum of the white noise after passing through the matched filter G*(l/z* ) and 

F{z) = H{z)G* rn {l/z*) = Y J f{nT s )z- n . (11.11) 



The zero-forcing equalizer removes all ISI introduced in the combined response /(f). From (11.10) we see that 
the equalizer to accomplish this is given by 



Hzf(z ) 



1 

wr 



( 11 . 12 ) 



This is the discrete-time equivalent to the analog equalizer (11.1) described above, and it suffers from the same 
noise enhancement properties. Specifically, the power spectrum N(z) is given by 



N(z) = N g (z)\H ZF (z)\ 2 



N 0 \G* m (l/z*)\ 2 

\F{z)\ 2 



No\G* m (l/ z*)\ 2 

\H(z)\ 2 \G* m (l/z *)\ 2 



N 0 

\H(z)\ 2 ' 



(11.13) 



We see from (11.13) that if the channel H(z) is sharply attenuated at any frequency within the bandwidth of 
interest, as is common on frequency-selective fading channels, the noise power will be significantly increased. 
This motivates an equalizer design that better optimizes between ISI mitigation and noise enhancement. One such 
equalizer is the MMSE equalizer, described in the next section. 
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The ZF equalizer defined by Hzf{z) = 1/ F(z) may not be implementable as a finite impulse response (FIR) 
filter. Specifically, it may not be possible to find a finite set of coefficients W-l, ■ ■ ■ ,wl such that 



W-LZ L + . . . + WLZ l = 



F{z) 



(11.14) 



In this case we find the set of coefficients { iu t } that best approximates the zero-forcing equalizer. Note that 
this is not straightforward since the approximation must be valid for all values of 2 . There are many ways we 
can make this approximation. One technique is to represent Hzf{z) as an infinite impulse response (HR) filter, 
1/ F(z) = YIfL- oo c i z ~ l anc * then set w < = c i- ^ can b e shown that this minimizes 



1 

W) 



(w- L z l + . . . + w L z L ) 



2 



at z = e JUJ . Alternatively, the tap weights can be set to minimize the peak distortion (worst-case ISI). Finding the 
tap weights to minimize peak distortion is a convex optimization problem and can be solved by standard techniques, 
e.g. the method of steepest descent [1]. 



Example 11.3: Consider an channel with impulse response 



h(t) 



e~ t / T t > 0, 

0 else, 



The channel also has AWGN with power spectral density Nq. Find a two-tap ZF equalizer for this channel. 



Solution: We have 



Thus, 



h[n] = 1 + e -r" 5 [n — 1} + e ^ ‘ 5[n — 2] + .... 



H(z) = 1 + e 



~T S 1 —^Tg 

t z + e t 



z 2 + e~ 



—3 






71=0 



-T s 

z — e t 



1 — T S 1 

So H eq (z) = jjjr-r = 1 — e - z . The two tap ZF equalizer therefore has tap weight coefficients up = 1 and 
Wl =e^. 



11.4.2 Minimum Mean Square Error (MMSE) Equalizer 

In MMSE equalization the goal of the equalizer design is to minimize the average mean square error (MSE) 
between the transmitted symbol dk and its estimate dk at the output of the equalizer. In other words the { iu t } ’s arc 
chosen to minimize E [dk — dk} 2 - Since the MMSE is a linear equalizer, its output df,. is a linear combination of the 
input samples y[k ]: 

L 

dk= y; my[k - i\. (11.15) 

i=—L 
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As such, finding the optimal filter coefficients {wi} becomes a standard problem in linear estimation. In fact, if the 
noise input to the equalizer is white, this is a standard Weiner filtering problem. However, because of the matched 
filter g^(-t) at the receiver front end, the noise input to the equalizer is not white but colored with power spectrum 
N 0 \G* m (l/z*)\ 2 . Therefore, in order to apply known techniques for optimal linear estimation, we expand the filter 
H eq (z ) into two components, a noise whitening component 1/G* n {l/ z*) and an ISI removal component H eq (z), 
as shown in Figure 1 1.4. 




Figure 11.4: MMSE Equalizer with Noise Whitening Filter. 

The purpose of the noise whitening filter, as indicated by the name, is to whiten the noise such that the noise 
component output from this filter has a constant power spectrum. Since the noise input to this receiver has power 
spectrum N{)\G* n (\/ z*)\ 2 , the appropriate noise whitening filter is \ / G* m (l / z*). The noise power spectrum at the 
output of the noise whitening filter is then NQ\G* m {l / z*)\ 2 /\G* n {l / z*)\ 2 = Nq. Note that the filter l/G* n (l/ z*) 
is not the only filter that will whiten the noise, and another noise whitening filter with more desirable properties 
(like stability) may be chosen. It might seem odd at first to introduce the matched filter g * n (—t) at the receiver 
front end only to cancel its effect in the equalizer. Recall, however, that the matched filter is meant to maximize the 
SNR prior to sampling. By removing the effect of this matched filter through noise whitening after sampling, we 
merely simplify the design of H eq (z ) to minimize MSE. In fact if the noise whitening filter does not yield optimal 
performance then its effect would be cancelled by the H eq (z) filter design, as we will see below in the case of HR 
MMSE equalizers. 

We assume the filter H eq (z), with input v n , is a linear filter with N = 2L + 1 taps: 

L 

H eq (z ) = ^2 WiZ~ l - (11.16) 

i=~L 

Our goal is to design the filter coefficients {wi} so as to minimize E [dk — dk] 2 - This is the same goal as for the 
total filter H eq (z), we’ve just added the noise whitening filter to make solving for these coefficients simpler. Define 

v = (v[k + L],v[k + L — 1] , v[k — L\) = (vk+L, Vk+L-i, • • • , v^-l) as a vector of inputs to the filter H eq (z ) 

used to obtain the filter output c//. and w = ( W-l , as the vector of filter coefficients. Then 

dk = w T v = v T w. (11.17) 
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Thus, we want to minimize the mean square error 



J = E [df. — dk] 2 = E [w T vv ff w* — 2K{v ii w* dk} + \dk \ 2 ] • (11.18) 

Define M v = Efvv 77 ] and Vd = E[v rr dp], The matrix M v is an N x N Hermitian matrix and Vd is a length N 
row vector. Assume E|dfc| 2 = 1. Then the MSE J is 

J = w t M v w* - 2K{v d w*} + 1. (11.19) 

We obtain the optimal tap vector w by setting the gradient y w -/ = 0 and solving for w. From (11.19) the 
gradient is given by 

/ qj qj \ 

VwJ = "77 , • 77 = 2w 1 My - 2v d . (11.20) 

\ow-L owl) 

Setting this to zero yields w 7 M v = Vd or, equivalently, that the optimal tap weights arc given by 

Wopt = (M V T ) 1 v d T . (11.21) 

Note that solving for w opt requires a matrix inversion with respect to the filter inputs. Thus, the complexity of this 
computation is quite high, typically on the order of N 2 to N 3 operations. Substituting in these optimal tap weights 
we obtain the minimum mean square error as 

Jmin = 1 - VdM v _1 Vd tf . (11.22) 

For an infinite length equalizer, v = ( v n+00 , . . . , v n , n n _oo) an d w = (w-oo, . . . , wq, . . . , moo). Then 
w 7 M v = Vd can be written as [5, Chapter 7.4] 

OO 

^2 Wi{f[j ~ + N o)d[j - i\ = g)n[-j], ~oo<j<oo. (11.23) 



Taking z transforms and noting that H eq (z) is the z transform of the filter coefficients w yields 



H eq (z)(F(z) + N 0 ) = G*(l/z*). 



(11.24) 



Solving for H eq (z ) yields 



Heq(z ) 



G* m ( I/Z*) 

F(z) + N 0 - 



(11.25) 



Since the MMSE equalizer consists of the noise whitening filter l/G* m {l/z*) plus the ISI removal component 
H eq (z), we get that the full MMSE equalizer, when it is not restricted to be finite length, becomes 



H eq (z) 



H eq (z ) 



1 

F(z) + N 0 ' 



(11.26) 



There are three interesting things to notice about this result. First of all, the ideal infinite length MMSE equalizer 
cancels out the noise whitening filter. Second, this infinite length equalizer is identical to the ZF filter except for 
the noise term Ap, so in the absence of noise the two equalizers arc equivalent. Finally, this ideal equalizer design 
clearly shows a balance between inverting the channel and noise enhancement: if F(z) is highly attenuated at 
some frequency the noise term No in the denominator prevents the noise from being significantly enhanced by 
the equalizer. Yet at frequencies where the noise power spectral density No is small compared to the composite 
channel F(z), the equalizer effectively inverts F(z). 
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For the equalizer (1 1.26) it can be shown [1, Chapter 10.2] that the minimum MSE (1 1.22) can be expressed 
in terms of the folded spectrum F%(f) as 



T ■ — T 

777,7,77. — 



r.5/T s 



N 0 



'-.5 /T s Fs(f) + No 



df. 



(11.27) 



This expression for MMSE has several interesting properties. First it can be shown that, as expected, 0 < Jmin = 
E [dk — dk ] 2 < 1. In addition, J m in = 0 in the absence of noise {No = 0) as long as F^(f) / 0 within the signal 
bandwidth of interest. Also, as expected, J m in = 1 if Nq = oo. 



Example 11.4: Find J m m when the folded spectrum F^(f) is flat, F%(f) = f[ 0], in the asymptotic limit of high 
and low SNR. 



Solution: If F^(f) = /[ 0] = fo then 



j . — T 

777 , 7 , n fi. 



r-5/Ts 

J-.b/Ts 



No 



-.5 IT a fo + No 



df 



Np 

fo + N 0 



For high SNR, fo » Nq so J min ~ No/ fo = Nq/ E s , where E s /No is the SNR per symbol. For low SNR, 
No » fo, so Jmin — No/ (No + fo) ~ No/No = 1 . 



11.5 Maximum Likelihood Sequence Estimation 



Maximum-likelihood sequence estimation (MFSE) avoids the problem of noise enhancement since it doesn’t use 
an equalizing filter: instead it estimates the sequence of transmitted symbols. The structure of the MFSE is the same 
as in Figure 11.3 except that the equalizer H eq (z) and decision device arc replaced by the MFSE algorithm. Given 
the channel response h(t), the MFSE algorithm chooses the input sequence {dk} that maximizes the likelihood of 
the received signal w(t). We now investigate this algorithm in more detail. 

Using a Gram-Schmidt orthonormalization procedure we can express wit) on a time interval [0, LT S ] as 

N 

w(t ) = Yw n< l> n (t), (11.28) 

77.— 1 



where {0 n (t) } form a complete set of orthonormal basis functions. The number N of functions in this set is a 
function of the channel memory, since w(t) on [0, LT S ] depends on do, ■ ■ ■ ,di, . With this expansion we have 

oo L 

tVn — ^ ' dkh n k T r'n — ^ " dkh n k T u n , (11.29) 

k=— oo fc=0 



where 



r LT S 



hnk — 



h(t — kT s )0* n {t)dt 



(11.30) 



and 



rLT a 



Vr. i = 



n{t)<p* n (t)dt. 



(11.31) 
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The u n arc complex Gaussian random variables with mean zero and covariance .5 E\y*i/ m \ = A/bfn — m ] . 
Thus, w A = (w \ , . . . , ioai) has a multivariate Gaussian distribution 

(11.32) 

Given a received signal w(t) or, equivalently, w ;V . the MLSE decodes this as the symbol sequence d L that 
maximizes the likelihood function p(w N \d L , h(t)) (or the log of this function). That is, the MLSE outputs the 
sequence 
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^2 w nh* nk = / w(r)h*(T - nT s )dr = y[n\, 
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Combining (11.33), (11.34), and (11.35) we have that 
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We see from this equation that the MLSE output depends only on the sampler output {y[&]} and the channel 
parameters f[n — k] = f(nT s — kT s ) where /(f) = h(t) * h*(—t). Since the derivation of the MLSE is based on 
the channel output w(t) only (prior to matched filtering), our derivation implies that the receiver matched filter in 
Figure 1 1.3 is optimal for MLSE detection (typically the matched filter is optimal for detecting signals in AWGN, 
but this derivation shows that it is also optimal for detecting signals in the presence of ISI if MLSE is used). 

The Viterbi algorithm can be used for MLSE to reduce complexity [1, 5, 6, 7], However, the complexity of 
this equalization technique still grows exponentially with the channel delay spread. A nonlinear technique with 
significantly less complexity is the decision-feedback decoder, or DFE. 



11.6 Decision-Feedback Equalization 

The DFE consists of a feedforward filter B(z) with the received sequence as input (similar to the linear equalizer) 
followed by a feedback filter D(z ) with the previously detected sequence as input. The DFE structure is shown 
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in Figure 11.5. Effectively, the DFE determines the ISI contribution from the detected symbols {d n } by passing 
them through the feedback filter that approximates the combined discrete equivalent baseband channel F(z). The 
resulting ISI is then subtracted from the incoming symbols. Since the feedback filter D(z) in Figure 11.5 sits in 
a feedback loop, it must be strictly causal, or else the system is unstable. The feedback filter of the DFE does not 
suffer from noise enhancement because it estimates the channel frequency response rather than its inverse. For 
channels with deep spectral nulls, DFEs generally perform much better than linear equalizers. 



A 

A A 




Figure 11.5: Decision-Feedback Equalizer Structure. 

Assuming W{z) has N\ taps and V(z) has N -2 taps, we can write the DFE output as 

o n 2 

dk = 22 my[k - i] - 22 y idk-i- 

i=— Ni i = 1 

The typical criteria for selecting the coefficients for W (z) and V (z) arc either zero-forcing (remove all ISI) or 
MMSE (minimize the expected MSE between the DFE output and the original symbol). When both W(z) and 
V (z) have infinite duration, it was shown by Price that the optimal feedforward filter for a zero-forcing DFE is 
\ / G* m { \ / z*), the same noise whitening filter as in the linear MMSE equalizer [9], In this case the feedback filter 
V (z) should be essentially the same as the combined baseband channel F(z). 

For the MMSE criterion, we wish to minimize F[d). — dk] 2 - Let f n = f[n] denote the samples of /(f). 
Then this minimization implies that the coefficients of the feedforward filter must satisfy the following set of linear 
equations: 

o 

22 w iWi = f-i > 

i=-JVi 

for qu = Y?j=-i fjfj+l-i + No 5[l — *],/,* = The coefficients of the feedback filter arc then 

determined from the feedforw ard coefficients by 

o 

Vk = - 22 W ifk-i- 

i=-N i 

These coefficients completely eliminate ISI when there arc no decision errors, i.e. when dk = dk- It was shown by 
Salz [10] that the resulting minimum MSE is 

I" r.5/Ta r No 1 

Jmin — exp \ T S In — — — — \df\. 

J-.5/T s L-^Sf/J+AoJ 
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In general the MMSE associated with a DFE is much lower than that of a linear equalizer, if the impact of feedback 
errors is ignored. 

DFEs exhibit feedback errors if dj. / dk, since the ISI subtracted by the feedback path is not the true ISI 
corresponding to d n . This error therefore propagates to later bit decisions. Moreover, this error propagation 
cannot be improved through channel coding, since the feedback path operates on coded channel symbols before 
decoding. That is because the ISI must be subtracted immediately, which doesn’t allow for any decoding delay. 
The error propagation therefore seriously degrades performance on channels with low SNR. This can be addressed 
by introducing some delay in the feedback path to allow for channel decoding [11] or through turbo equalization, 
described in the next section. A systematic treatment of the DFE with coding can be found in [12, 13]. Moreover, 
the DFE structure can be generalized to encompass MIMO channels [14] 

11.7 Other Equalization Methods 

Although MFSE is the optimal form of equalization, its complexity precludes its widespread use. There has been 
much work on reducing the complexity of the MFSE [1, Chapter 10.4]. Most of these techniques either reduce 
the number of surviving sequences in the Viterbi algorithm or reduce the number of symbols spanned by the ISI 
through preprocessing or decision-feedback in the Viterbi detector. These reduced complexity equalizers have 
better performance versus complexity tradeoffs than the other equalization techniques, and achieve performance 
close to that of the optimal MFSE with significantly less complexity. 

The turbo decoding principle introduced in Chapter 8.5 can also be used in equalizer design [15, 16]. The 
resulting design is called a turbo equalizer. A turbo equalizer iterates between a MAP equalizer and a decoder to 
determine the transmitted symbol. The MAP equalizer computes the a posteriori probability (APP) of the transmit- 
ted symbol given the past channel outputs. The decoder computes the log likelihood ratio (FER) associated with 
the transmitted symbol given past channel outputs. The APP and EFR comprise the soft information exchanged 
between the equalizer and decoder in the turbo iteration. After some number of iterations, the turbo equalizer 
converges on its estimate of the transmitted symbol. 

If the channel is known at the transmitter, then the transmitter can pre-equalize the transmitted signal by 
passing it through a filter that effectively inverts the channel frequency response. Since the channel inversion 
occurs in the transmitter rather than the receiver, there is no noise enhancement. It is difficult to pre-equalize in 
a time-varying channel since the transmitter must have an accurate estimate of the channel, but this approach is 
practical to implement in relatively static wireline channels. A problem with this approach is that the channel 
inversion can increase the dynamic range of the transmitted signal, which can result in distortion or inefficiency 
from the amplifier. This problem has been addressed through a precoding technique called Tomlinson-Harashima 
precoding [17, 18]. 

11.8 Adaptive Equalizers: Training and Tracking 

All of the equalizers described so far are designed based on a known value of the composite channel response 
hit) = g(t) * c(t). Since the channel c(t) in generally not known when the receiver is designed, the equalizer 
must be tunable so it can adjust to different values of c(t). Moreover, since in wireless channels cit) = c(r, t) 
will change over time, the system must periodically estimate the channel c(t) and update the equalizer coefficients 
accordingly. This process is called equalizer training or adaptive equalization [20, 19]. The equalizer can also use 
the detected data to adjust the equalizer coefficients. This process is called equalizer tracking. Blind equalizers 
do not use training: they learn the channel response via the detected data only [21, 22, 23, 24], 

During training, the coefficients of the equalizer arc updated at time k based on a known training sequence 
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[dk-M, ■ ■ ■ , dk] that has been sent over the channel. The length M of the training sequence depends on the number 
of equalizer coefficients that must be determined and the convergence speed of the training algorithm. Note that 
the equalizer must be retrained when the channel decorrelates, i.e. at least every T c seconds where T c is the channel 
coherence time. Thus, if the training algorithm is slow relative to the channel coherence time then the channel may 
change before the equalizer can learn the channel. Specifically, if MT S > T c then the channel will decorrelate 
before the equalizer has finished training. In this case equalization is not an effective countermeasure for ISI, and 
some other technique (e.g. multicarrier modulation or CDMA) is needed. 

Let {dk} denote the bit decisions output from the equalizer given a transmitted training sequence {dk}- Our 
goal is to update the N equalizer coefficients at time k + 1 based on the training sequence we have received up to 

time k. We will denote these updated coefficients as {w_L(k + 1), . . . ,wi{k + 1)}. We will use the MMSE as 

our criterion to update these coefficients, i.e. we will chose { W-L{k + 1), . . . ,WL(k + 1)} as the coefficients that 
minimize the MSE between dk and dk- Recall that dk = w i{k)l)k-b where y k = y[k ] is the output of the 

sampler in Figure 11.3 at time k with the known training sequence as input. The {w-L{k + 1), . . . , v'iik + 1)} 
that minimize MSE are obtained via a Weiner filter [1, 5], Specifically, 

w (k + 1) = {w-L{k + 1), . . ., wi.(k + 1)} = R -1 p, (11.37) 

where p = d k [y k+L ...yk-L} T and 

I Uk+L | 2 Vk+LVt+L - 1 ••• Vk+LVk-L 

Vk+L-lVk+L \Vk+L-l\ 2 ■■■ Vk+L-lVt-L /1ia ox 

Vk-Ldl+L ■■■ ■■■ | Vk-L? 

Note that the optimal tap updates in this case requires a matrix inversion, which requires A" 2 to N A multiply 
operations on each iteration (each symbol time T s ). However, the convergence of this algorithm is very fast: it 
typically converges in around N symbol times for N the number of equalizer tap weights. 

If complexity is an issue then the large number of multiply operations needed to do MMSE training can be 
prohibitive. A simpler technique is the least mean square (LMS) algorithm [?]. In this algorithm the tap weight 
vector w(/r + 1) is updated linearly as 

w (k + 1) = w (fc) + A e k [y* k+L - . . y * k _ L \ , (11.39) 

where e k = d k — d k is the error between the bit decisions and the training sequence and A is the step size of the 
algorithm, which is a parameter that can be chosen. The choice of A dictates the convergence speed and stability 
of the algorithm. For small values of A convergence is very slow, at it takes many more than N bits for the algo- 
rithm to converge to the proper equalizer coefficients. However, if A is chosen to be large then the algorithm can 
go unstable, basically skipping over the desired tap weights at every iteration. Thus, for good performance of the 
LMS algorithm A is typically small and convergence is typically slow. However, the LMS algorithm exhibits sig- 
nificantly reduced complexity compared to the MMSE algorithm since the tap updates only require approximately 
2 N + 1 multiply operations per iteration. Thus, the complexity is linear in the number of tap weights. Other algo- 
rithms, such as the root-least-squares (RLS), Square- root- least-squares, and Fast Kalman provide various tradeoffs 
in terms of complexity and performance that lie between the two extremes of the LMS algorithm (slow conver- 
gence but low complexity) and the MMSE algorithm (fast convergence but very high complexity). A description 
of these other algorithms is given in [1], Table 11.1 summarizes the specific number of multiply operations and the 
relative convergence rate of all these algorithms. 

Note that the symbol decisions d k output from the equalizer arc typically passed through a threshold detector 
to round the decision to the nearest constellation point. The resulting roundoff error can be used to adjust the 
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equalizer coefficients during data transmission. This is called equalizer tracking. Tracking is based on the premise 
that if the roundoff error is nonzero then the equalizer is not perfectly trained, and the roundoff error can be used to 
adjust the channel estimate inherent in the equalizer. The procedure works as follows. The equalizer output bits <?/. 

and threshold detector output bits dk arc used to adjust an estimate of the baseband equivalent composite channel 

H(z). In particular, the coefficients of H(z) are adjusted to minimize the MSE between df. and dk, using the same 
MMSE procedures described earlier. The updated version of H(z ) is then taken to equal the composite channel 
and used to update the equalizer coefficients accordingly. More details can be found in [1, 5]. 

A summary of the training and tracking characteristics for the different algorithms as a function of the number 
of taps N is given in Table 11.1. 



Algorithm 


# of multiply operations 


Complexity 


Convergence 


Tracking 


FMS 


2N + 1 


Fow 


Slow (> 10 NT S ) 


Poor 


MMSE 


N 2 to N' 6 


Very High 


Fast (» NT S ) 


Good 


RES 


2.5N 2 + 4.51V 


High 


Fast (« NT S ) 


Good 


Fast Kalman DFE 


201V + 5 


Fairly Low 


Fast (« NTs) 


Good 


Square Root RES DFE 


1.5N 2 + 6.51V 


High 


Fast (« NTs) 


Good 



Table 11.1: Equalizer Training and Tracking Char acteristics 



Note that the Fast Kalman and Square Root RLS may be unstable in their convergence and tracking, which is the 
price paid for their fast convergence with relatively low complexity. 



Example 11.5: Consider a 5 tap equalizer that must retrain every .5 T c , where T c is the coherence time of the chan- 
nel. Assume the transmitted signal is BPSK with a rate of 1 Mbps for both data and training sequence transmission. 
Compare the length of training sequence required for the LMS equalizer versus the Fast Kalman DFE. For an 80 
Hz Doppler, by how much is the data rate reduced in order to do periodic training for each of these equalizers. 
How many operations does each require for this training? 

Solution: The equalizers must retrain every .5 T c = .5 /B^ = .5/80 = 6.25 msec. From the table, for a data 
rate of Rb = 1/7/ = 1 Mbps, the FMS algorithm requires 10 A' 7/ = 50 x 10 -6 seconds to train, and the Fast 
Kalman DFE requires IV 7/ = 50 x 10 -5 seconds to train. If training occurs every 6.25 msec, the fraction of time 
the FMS algorithm uses for training is 50 x 10“ 6 /6.25 x 10 = .008. Thus, the effective data rate becomes 

(1 — .008)i?b=.992 Mbps. The fraction of time used by the Fast Kalman DFE for training is 50 x 10 _5 /6.25 x 
10~ 3 = .0008, resulting in an effective data rate of (1 — .0008) Rb= -9992 Mbps. The FMS algorithm requires 
approximately 2 N +1 = 11 operations for training per training period, whereas the Fast Kalman DFE requires 
20 A' + 5 = 105 operations, an order of magnitude more than the FMS algorithm. With processor technology 
today, this is not a significant difference in terms of processor requirements. 
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Chapter 11 Problems 

1. Design a continuous time passband equalizer H eq (f) to completely remove the ISI introduced by a channel 
with impulse response H(f) = 1//. Assume your transmitted signal has a (passband) bandwidth of 100 
KHz and the carrier frequency is 100 MHz. Assuming a channel with AWGN of PSD N( h find the noise 
power at the output of your equalizer within the 100 KHz bandwidth of interest. Will this equalizer improve 
system performance? 



2. This problem investigates the interference generated by ISI, and the noise enhancement that occurs in zero- 
forcing equalization. Consider two multipath channels, where the first channel has impulse response profile 




0 < t < T m 
else 



and the second channel has impulse response 

h, 2 (t) = e~ t / Tm , 0 < t < oo. 



(a) Assume that the transmitted signal s(t) is an infinite sequence of impulses with amplitude A and time 
separation T& = T m / 2: s(t) = ^^L_ oc AS(t — nTi, ) . Calculate the average ISI power over a bit time 

(b) Let T m = 10/jsec. Suppose a BPSK signal is transmitted over a channel with impulse response h\(t). 
What maximum data rate can be sent over the channel with zero ISI under BPSK modulation with 
rectangular pulse shaping of pulse width T = 1/xsec. How would this answer change if the baseband 
signal bandwidth was restricted to 100 KHz. 



3. Consider an channel with impulse response 



hit) 



e~ t / T t > 0, 

0 else, 



where r = 6//scc. The channel also has AWGN with power spectral density Nq. 



(a) What is the frequency response of a continuous-time zero-forcing linear eqqualizer for this channel? 
(Assume no matched filter and pulse shaping) 

(b) Suppose we transmit a 30 KHz baseband signal over this Assume that frequency response of signal is 
a rectangular pulse shape. What is the ratio of SNR with equalization to SNR without equalization in 
the bandwidth of our transmitted signal? Hint: Recall that a stationary random process with power- 
spectral density S(f ) has total power f S(f)df, and if this process is passed through a filter G(f), the 
output process has power spectral density S{f)\G{f) | 2 . 

(c) Approximate the MMSE equalizer for this channel using a discrete-time transversal filter with 3 taps. 
Use any approximation method you like, as long as it reasonably approximates the time-domain re- 
sponse of the MMSE equalizer. 

Cj, where { c, } is the inverse z-transform of 1 /F(z). 
1 + ...+ WNZ ~ N )\ 2 

at z = e JaJ . 



4. Consider an FIR ZF equalizer with tap weights Wi = 
Show that this choice of tap weights minimizes 

\~Fd~A - ( u ’o + m z~ 
F(z) 
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5. Consider a communication system where the modulated signal s(t) has power 10 mW, carrier frequency f c , 
and passband bandwidth B s = 40 MHz. The signal s(t) passes through a frequency-selective fading channel 
with frequency response 

'1 f c - 20MHz < f < f c — 10MHz 
.5 f c - 10MHz <f<f c 

H(f) = 2 f c <f<fc + 10MHz 

.25 f c + 10MHz < f < f c + 20MHz 
0 else 

The received signal is y(t) = s(t) * h(t) + n(t), where n(f) is AWGN with PSD Ao = 10~ 12 W/Hz. 

(a) Suppose y(t) is passed through a continuous-time passband ZF equalizer. Find the frequency response 
H eq (f) for this equalizer within the bandwidth of interest (/ c ± 20 MHz). 

(b) For the equalizer of part (a), find the SNR at the equalizer output. 

(c) Suppose the symbol time for s(t) is T s = .5 / B s and assume no restrictions on the constellation size. 
Find the maximum data rate that can be sent over the channel with the ZF equalizer of paid (a) such 
that Pi, < 10” 3 . 

6. Consider an ISI channel with received signal after transmission through the channel given by 

OO 

y(t) = Y ~ iT )> 

i =— oo 

where x\ = ±1 and /(f) is the combined baseband impulse response of the pulse shaping filter and channel. 
Assume /(f) = sm.(wt/T)/{wt/T), which satisfies the Nyquist criterion for zero 1ST There arc two diffi- 
culties with this pulse shape: first, it has a rectangular spectrum, which is difficult to implement in practice. 
In addition, the tails of the pulse decay as 1 /f , so timing error leads to a sequence of ISI samples which do 
not converge. For parts (a), (b) and (c) of the problem below we make the assumption that /(f) = 0 for 
|f | > NT, where AMs a positive integer. This is not strictly correct, since it would imply that /(f) is both 
time-limited and bandlimited. However, it is a reasonable approximation in practice. 

(a) Show that the folded spectrum of /(f) is flat. 

(b) Suppose that due to timing error, the signal is sampled at f = kT + to, where to < T. Calculate the 
response y\~ = y(kT + fo) and separate your answer into the desired term and the ISI terms. 

(c) Assume that the polarities of the x % arc such that every term in the ISI is positive (worst-case ISI). 
Under this assumption show that the ISI term from paid (a) is 

2 N 

ISI £3 - sin(7rf 0 /T) Y ~ — 797 ^7, 

7 r “ n z — tf, T z 

n= 1 u/ 

and therefore ISI — > 00 as N — » 00. 

7. Let g(t.) = sinc(f/T s ), |f| < T s . Find the matched filter g m {t) for g(t). Find the noise whitening filter 
1 / G* n (l / z*) for this system that must be used in an MMSE equalizer to whiten the noise. 

8. Show that the minimum MSE (1 1.22) for an HR MMSE equalizer can be expressed in terms of the folded 
spectrum TV//) as 

r 5/T N 0 

Jm ‘" = T h /Tft(/)+JV„ # ' 
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9. Show that the gradient of tap weights associated with the MMSE equalizer is given by 



/ dJ dJ \ 
V-9u; 0 ’ ' ' ' ’ dw N ) 



2w t M v — 2vd- 



Set this equal to zero and solve for the optimal tap weights to obtain 

Wopt = (M V T ) 1 v d H . 



10. Show that the MMSE J m i n for an HR MMSE equalizer, given by (1 1.27), satisfies 0 < J m in < 1. 

11. Compare the value of the minimum MSE, under both MMSE equalization and DF equalization, for a 
channel with 3 tap discrete-time equivalent model C(z) = 1 + .5 z -1 + .3 z~ 2 . 

12. This problem investigates equalization for ultrawideband systems. The basic premise of these systems is 
to spread a data signal and its corresponding power over a very wide bandwidth so that the power per Hz 
of signal is small (typically below the noise floor). Thus such systems can coexist with existing systems 
without causing them much interference. Consider an UWB system with BPSK modulation. The data bits 
arc modulated with a rectangular pulse g(t) where g(t) has a very narrow time duration T as compared to 
the bit time 7),. For this problem we assume T = 10“ 9 . Thus an UWB signal with BPSK modulation would 
have the form s(t) = ]C r) d n g(t — nTf, ) , where d n takes the value ±1 and 7), >> T is the bit time. A sketch 
of s(t) with a data sequence of alternating Is and Os is shown in the figure below 




(a) For the figure shown above what is the approximate bandwidth of s(t) if 7) ; = 10 5 ? 

(b) One of the selling points of UWB signals is that they do not experience flat fading in typical channels. 
Consider a single bit transmission s(t) = (logit). Suppose s(t) is transmitted through a channel that 
follows a two ray model h(t) = ao5{t) + a\8(t — r). Sketch the channel output for t « T and 
t » T. Which of your two sketches is more likely to depict the output of a real wireless channel? 
Why does this imply that UWB signals don’t typically experience flat fading? 

(c) Consider a channel with a multipath delay spread of T m = 20//S. For the figure shown above, what is 
the EXACT maximum data rate that can be sent over this channel with no ISI? Is the bandwidth of sit) 
in the figure above less than the channel coherence bandwidth at this data rate? 

(d) Let F(z) = fro + a\z~ v + z~ 2 denote the combined baseband impulse response of the transmitter, 

channel, and matched filter in an UWB system. Find a 2 tap digital equalizer H eq (z) = tvo + w\z ~ 1 
that approximates an HR zero forcing equalizer for Fix). Any reasonable approximation is fine as long 
as you justify it. 



347 




13. This problem illustrates the noise enhancement of zero-forcing equalizers, and how this enhancement can 
be mitigated using an MMSE approach. Consider a frequency-selective fading channel with baseband fre- 
quency response 

1 0 < |/| < lOKHz 

1/2 lOKHz < |/| < 20KHz 

. _ 1/3 20KHz < |/| < 30KHz 

l / | 1/4 30KHz < |/| < 40KHz 

1/5 40KHz < |/| < 50KHz 

0 else 

The frequency response is symmetric in positive and negative frequencies. Assume an AWGN channel with 
noise PSD Nq = 1CP 9 . 

(a) Find a ZF analog equalizer that completely removes the ISI introduced by H(f). 

(b) Find the total noise power at the output of the equalizer from paid (a). 

(c) Assume a MMSE analog equalizer of the form H eq (f) = 1 /(H(f) + a). Find the total noise power at 
the output of this equalizer for an AWGN input with PSD Nq for a = .5 and for a = 1. 

(d) Describe qualitatively two effects on a signal that is transmitted over channel H(f) and then passed 

through the MMSE equalizer H eq (f) = + cr) with a > 0. What design considerations should 

go into the choice of a? 

(e) What happens to the total noise power for the MMSE equalizer in paid (c) as a — ► oo? What is the 
disadvantage of letting a — » oo in this equalizer design? 

(f) For the equalizer designed in paid (d), if the system has a data rate of 100 Kbps, and your equalizer 
requires a training sequence of 1000 bits to train, what is the maximum channel Doppler such that the 
equalizer coefficients converge before the channel decorrelates? 

14. Why does an equalizer that tracks the channel during data transmission still need to train periodically? Name 
2 benefits of tracking. 

15. Assume a 4 tap equalizer which must retrain every .5 T c , where T c is the channel coherence time. If a 
DSP chip can perform 10 million multiplications per second, and the convergence rates for the FMS DFE 
algorithm and the RES algorithm are 1000 iterations (bit times) and 50 iterations, respectively, then what is 
the maximum data rate for both equalizers assuming BPSK modulation and Doppler spread B,i = 100Hz. 
Repeat for /// = 1000Hz. Assume that the transmitting speed is equal for training sequences and for 
information sequences. 

16. In this problem we find the procedure for updating the channel estimate during tracking. Find the formula 
for updating the channel coefficients corresponding to the channel H(z) based on on minimizing the MSE 

between and dj... 

17. Ultrawideband (UWB) systems spread a data signal and its corresponding power over a very wide bandwidth 
so that the power per Hz of signal is small (typically below the noise floor). Thus such systems can coex- 
ist with existing systems without causing them much interference. Consider an UWB system with BPSK 
modulation. The data bits are modulated with a rectangular pulse g(t) where g(t ) has a very narrow time 
duration T as compared to the bit time Tj,. For this problem we assume T = 10 -9 . Thus an UWB signal 
with BPSK modulation would have the form s{t) = dng(t — nT &), where d n takes the value ±1 and 
T}, » T is the bit time. A sketch of s(t) with a data sequence of alternating Is and 0s is shown in the figure 
below 
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(a) For the figure shown above what is the approximate bandwidth of s(t) if T& = 10 5 ? 

(b) One of the selling points of UWB signals is that they do not experience flat fading in typical channels. 
Consider a single bit transmission s(t) = (logit). Suppose sit) is transmitted through a channel that 
follows a two ray model h(t) = ao5{t) + a\5{t — r). Sketch the channel output for t « T and 
t » T. Which of your two sketches is more likely to depict the output of a real wireless channel? 
Why does this imply that UWB signals don’t typically experience flat fading? 

(c) Consider a channel with a multipath delay spread of T m = 2 Ops. For the figure shown above, what is 
the EXACT maximum data rate that can be sent over this channel with no ISI? Is the bandwidth of s(t) 
in the figure above less than the channel coherence bandwidth at this data rate? 

(d) Let F(z) = O'o + a\z~ x + a: 2 A 2 denote the combined baseband impulse response of the transmitter, 
channel, and matched filter in an UWB system. Find a 2 tap digital equalizer H eq (z) = tvo + w\z~ l 
that approximates an HR zero forcing equalizer for F(z). Any reasonable approximation is fine as long 
as you justify it. 

(e) For the equalizer designed in paid (d), if the system has a data rate of 100 Kbps, and your equalizer 
requires a training sequence of 1000 bits to train, what is the maximum channel Doppler such that the 
equalizer coefficients converge before the channel decorrelates? 
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Chapter 12 

Multicarrier Modulation 



The basic idea of multicarrier modulation is to divide the transmitted bitstream into many different substreams and 
send these over many different subchannels. Typically the subchannels arc orthogonal under ideal propagation 
conditions. The data rate on each of the subchannels is much less than the total data rate, and the corresponding 
subchannel bandwidth is much less than the total system bandwidth. The number of substreams is chosen to insure 
that each subchannel has a bandwidth less than the coherence bandwidth of the channel, so the subchannels experi- 
ence relatively flat fading. Thus, the ISI on each subchannel is small. The subchannels in multicarrier modulation 
need not be contiguous, so a large continuous block of spectrum is not needed for high rate multicarrier commu- 
nications. Moreover, multicarrier modulation is efficiently implemented digitally. In this discrete implementation, 
called orthogonal frequency division multiplexing (OFDM), the ISI can be completely eliminated through the use 
of a cyclic prefix. 

Multicarrier modulation is currently used in many wireless systems. However, it is not a new technique: it 
was first used for military HF radios in the late 1950’s and early 1960’s. Starting around 1990 [1], multicarrier 
modulation has been used in many diverse wired and wireless applications, including digital audio and video 
broadcasting in Europe [3], digital subscriber lines (DSL) using discrete multitone [5, 12], and the most recent 
generation of wireless LANs [26, 28]. There are also a number of newly emerging uses for multicarrier techniques, 
including fixed wireless broadband services [27, 14], mobile wireless broadband known as FLASH-OFDM [13], 
and even for ultrawideband radios, where multiband OFDM is one of the two competing proposals for the IEEE 
802.15 ultrawideband standard. Multicarrier modulation is also a candidate for the air interface in next generation 
cellular systems [18, 32]. 

The multicarrier technique can be implemented in multiple ways, including vector coding [17] and OFDM [7], 
all of which are discussed in this chapter. These techniques have subtle differences, but arc all based on the same 
premise of breaking a wideband channel into multiple parallel narrowband channels by means of an orthogonal 
channel partition. 

There is some debate as to whether multicarrier or single carrier modulation is better for ISI channels with 
delay spreads on the order of the symbol time. It is claimed in [3] that for some mobile radio applications, 
single carrier with equalization has roughly the same performance as multicarrier modulation with channel coding, 
frequency-domain interleaving, and weighted maximum-likelihood decoding. Adaptive loading was not taken 
into account in [3], which has the potential to significantly improve multicarrier performance [8]. But there arc 
other problems with multicarrier modulation that impair its performance, most significantly frequency offset and 
timing jitter, which degrade the orthogonality of the subchannels. In addition, the peak-to-average power ratio of 
multicarrier is significantly higher than that of single carrier systems, which is a serious problem when nonlinear 
amplifiers arc used. Tradeoffs between multicarrier and single carrier block transmission systems with respect to 
these impairments arc discussed in [9]. 
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Despite these challenges, multicarrier techniques arc common in high data rate wireless systems with mod- 
erate to large delay spread, as they have significant advantages over time-domain equalization. In particular, the 
number of taps required for an equalizer with good performance in a high data rate system is typically large. Thus, 
these equalizers are highly complex. Moreover, it is difficult to maintain accurate weights for a large number of 
equalizer taps in a rapidly varying channel. For these reasons, most emerging high rate wireless systems use either 
multicarrier modulation or spread spectrum instead of equalization to compensate for 1ST 

12.1 Data Transmission using Multiple Carriers 

The simplest form of multicarrier modulation divides the data stream into multiple substreams to be transmitted 
over different orthogonal subchannels centered at different subcarrier frequencies. The number of substreams 
is chosen to make the symbol time on each substream much greater than the delay spread of the channel or, 
equivalently, to make the substream bandwidth less than the channel coherence bandwidth. This insures that the 
substreams will not experience significant 1ST 

Consider a linearly- modulated system with data rate R and passband bandwidth B. The coherence bandwidth 
for the channel is assumed to be B c < B, so the signal experiences frequency-selective fading. The basic premise 
of multicarrier modulation is to break this wideband system into N linearly- modulated subsystems in parallel, 
each with subchannel bandwidth Bn = B/N and data rate I In ~ R/N. For N sufficiently large, the subchannel 
bandwidth Bn = B/N « B c , which insures relatively flat fading on each subchannel. This can also be seen in 
the time domain: the symbol time Tat of the modulated signal in each subchannel is proportional to the subchannel 
bandwidth 1/Bn ■ So Bn « B c implies that Tn ~ 1/Bn » 1 / B c ~ T m , where T m denotes the delay spread 
of the channel. Thus, if N is sufficiently large, the symbol time is much bigger than the delay spread, so each 
subchannel experiences little ISI degradation. 

Figure 12.1 illustrates a multicarrier transmitter 1 . The bit stream is divided into N substreams via a serial-to- 
parallel converter. The nth substream is linearly-modulated (typically via QAM or PSK) relative to the subcarrier 
frequency f n and occupies passband bandwidth Bn- We assume coherent demodulation of the subcarriers so the 
subcarrier phase is neglected in our analysis. If we assume raised cosine pulses for g(t') we get a symbol time 
Tn = (1 + (3) /Bn for each substream, where (3 is the rolloff factor of the pulse shape. The modulated signals 
associated with all the subchannels are summed together to form the transmitted signal, given as 

n—i 

s(t) = ^2 Sig(t)cos(2irfit + 4>i ), (12.1) 

i=0 

where $■, is the complex symbol associated with the ith subcarrier and (b r is the phase offset of the ith carrier. For 
nonoverlapping subchannels we set /,; = f 0 + i = 0, . . . , N — 1. The substreams then occupy orthogonal 

subchannels with passband bandwidth B n, yielding a total passband bandwidth NB n = B and data rate NRn ~ 
R. Thus, this form of multicarrier modulation does not change the data rate or signal bandwidth relative to the 
original system, but it almost completely eliminates ISI for Bn « B c . 

The receiver for this multicarrier modulation is shown in Figure 12.2. Each substream is passed through a 
narrowband filter to remove the other substreams, demodulated, and combined via a parallel-to-serial converter 
to form the original data stream. Note that the ith subchannel will be affected by flat fading corresponding to a 
channel gain a; = | #(/,:) |. 

Although this simple type of multicarrier modulation is easy to understand, it has several significant short- 
comings. First, in a realistic implementation, subchannels will occupy a larger bandwidth than under ideal raised 

'in practice the complex symbol s z would have its real part transmitted over the in-phase signaling branch and its imaginary part 
transmitted over the quadrature signaling branch. For simplicity we illustrate multicarrier based on sending a complex symbol along the 
in-phase signaling branch. 
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Figure 12.1: Multicarrier Transmitter. 



cosine pulse shaping since the pulse shape must be time-limited. Let e/T/v denote the additional bandwidth re- 
quired due to time-limiting of these pulse shapes. The subchannels must then be separated by (1 + (3 + e)/T^, and 
since the multicarrier system has N subchannels, the bandwidth penalty for time limiting is eN/T n- In particular, 
the total required bandwidth for nonoverlapping subchannels is 



_N( 1 + 0 + e) 

T n 



(12.2) 



Thus, this form of multicarrier modulation can be spectrally inefficient. Additionally, near- ideal (and hence, ex- 
pensive) low pass filters will be required to maintain the orthogonality of the subcarriers at the receiver. Perhaps 
most importantly, this scheme requires N independent modulators and demodulators, which entails significant 
expense, size, and power consumption. The next section presents a modulation method that allows subcarriers to 
overlap and removes the need for tight filtering. Section 12.4 presents the discrete implementation of multicarrier 
modulation, which eliminates the need for multiple modulators and demodulators. 



Example 12.1: Consider a multicarrier system with a total passband bandwidth of 1 MHz. Suppose the system 
operates in a city with channel delay spread T m = 20//s. How many subchannels are needed to obtain approxi- 
mately flat-fading in each subchannel. 

Solution: The channel coherence bandwidth is B c = 1 /T m = 1/. 00002 = 50 KHz. To insure flat-fading on 
each subchannel, we take Bn = B/N = .1 B c « B c . Thus, N = B/.1B C = 1000000/5000 = 200 sub- 
channels are needed to insure flat-fading on each subchannel. In discrete implementations of multicarrier N 
must be a power of two for the DFT and IDFT operations, in which case N = 256 for this set of parameters. 
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Figure 12.2: Multicarrier Receiver. 



Example 12.2: Consider a multicarrier system with Tjy = -2 ms: Ty >> T m for T m the channel delay spread, so 
each subchannel experiences minimal ISI. Assume the system has N = 128 subchannels. If raised cosine pulses 
with p = 1 are used, and the additional bandwidth due to time limiting required to insure minimal power outside 
the signal bandwidth is e/TV = .1, then what is the total bandwidth of the system? 



Solution: From (12.2), 



B = 



N( l + P + e) 
T n 



128(1 + 1 + .1) 
.0002 



1.344 MHz. 



We will see in the next section that the bandwidth requirements for this system can be substantially reduced by 
overlapping subchannels. 



12.2 Multicarrier Modulation with Overlapping Subchannels 



We can improve on the spectral efficiency of multicarrier modulation by overlapping the subchannels. The subcar- 
riers must still be orthogonal so that they can be separated out by the demodulator in the receiver. The subcarriers 
{cos(27r(/o + */T/v) + pi), i = 0, 1, 2 . . .} form a set of (approximately) orthogonal basis functions on the interval 
[0, Tn] for any set of subcarrier phase offsets {pi} since 



fTv 



cos(27t(/o + i/T N )t + pi) cos(27t(/o + j/T N )t + pj)dt 

fT N rT N 



r±N 

.5 cos(27r(i — j)f/Tjy + pi — pj)dt + / .5 cos(27r(2/o + i + j)t/T n + pi + pj)dt (12.3) 

Jo 



cTn 



.5 cos(27r(i — j)f/Tjy + pi — pj)dt 



= .5 T N 8(i-j), 
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where the approximation follows from that fact that the second integral in (12.3) is approximately zero for /oTy >> 
1 . Moreover, it is easily shown that no set of subcarriers with a smaller frequency separation forms an orthogonal 
set on [0, T/v] for arbitrary subcarrier phase offsets. This implies that the minimum frequency separation required 
for subcarriers to remain orthogonal over the symbol interval [0, Ty] is 1/Tjy. Since the carriers arc orthogonal, 
from Chapter 5.1 the set of functions {g(t) cos(27r(/o + i/T^)t + 4>i),i = 0, 1, ... N — 1} also form a set of 
(approximately) orthonormal basis functions for appropriately chosen baseband pulse shapes g(t): the family of 
raised cosine pulses arc a common choice for this pulse shape. Given this orthonormal basis set, even if the sub- 
channels overlap, the modulated signals transmitted in each subchannel can be separated out in the receiver, as we 
now show. 

Consider a multicarrier system where each subchannel is modulated using raised cosine pulse shapes with 
rolloff factor /?. The passband bandwidth of each subchannel is then Bn = (1 + /3)/Ty. The / 1 li subcarrier 
frequency is set to (/o + z/Ty), i = 0 . 1 ... A r — 1 for some /o, so the subcarriers are separated by 1 /Ty. 
However, the passband bandwidth of each subchannel is Bn = (1 + /3) /Tn > 1 /Ty for (3 > 0, so the subchannels 
overlap. Excess bandwidth due to time windowing will increase the subcarrier bandwidth by an additional f/Wy. 
However, (3 and e do not affect the total system bandwidth due to the subchannel overlap except in the first and last 
subchannels, as illustrated in Figure 12.3. The total system bandwidth with overlapping subchannels is given by 



B = 



N + P + e 
Tn 



N 

Tn’ 



(12.4) 



where the approximation holds for N large. Thus, with N large, the impact of (3 and e on the total system bandwidth 
is negligible, in contrast to the required bandwidth B = N(1 + (3 + e) /T n when the subchannels do not overlap. 




Figure 12.3: Multicarrier with Overlapping Subcarriers. 



Example 12.3: Compare the required bandwidth of a multicarrier system with overlapping subchannels versus 
nonoverlapping subchannels using the same parameters as in Example 12.2. 



Solution In the prior example Ty- = .2 ms, N = 128, (3=1, and e = .1 With overlapping subchannels, from 
(12.4), 



B = 



N + (3 + e 
T n 



128 + 1 + .1 

.0002 



645.5 KHz « B/T n = 640 KHz. 



By comparison, in the prior example the required bandwidth with nonoverlapping subchannels was shown to be 
1.344 MHz, more than double the required bandwidth when the subchannels overlap. 



Clearly, in order to separate out overlapping subcarriers, a different receiver structure is needed than the one 
shown in Figure 12.2. In particular, overlapping subchannels arc demodulated with the receiver structure shown 
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in Figure 12.4, which demodulates the appropriate symbol without interference from overlapping subchannels. 
Specifically, if the effect of the channel h(t) and noise nit) arc neglected then for received signal sit) given by 
(12.1), the input to each symbol demapper in Figure 12.4 is 



Sjg(t ) cos(27r fjt + 4>j) J g(t ) cos(27r fit + < i>i)dt 

| /'Tv 

= p 2 (t) cos(27t(/ 0 + j/T N )t + 4>j) cos(27r(/o + i/T N )t + 4>i)dt 

j=o Jo 
N - 1 

= 5Z s i 5{ ' j ~ ») 

3=0 




(12.5) 

( 12 . 6 ) 



where (12.5) follows from the fact that the functions {g{t)cos{2irfjt + (f>j)} form a set of orthonormal basis 
functions on [0, T/v]. If the channel and noise effects arc included, the symbol in the ith subchannel is scaled by 
the channel gain ctj = H(fi ) and corrupted by the noise sample, so s l = a t s, + n t , where n, isAWGN with power 
NqBn. This multicarrier system makes much more efficient use of bandwidth than in systems with nonoverlapping 
subcarriers. However, since the subcarriers overlap, their orthogonality is compromised by timing and frequency 
offset. These effects, even when relatively small, can significantly degrade performance, as they cause subchannels 
to interfere with each other. These effects arc discussed in more detail in Section 12.5.2. 




cos(27tf t) 
N-i 



Figure 12.4: Multicarrier Receiver for Overlapping Subcarriers. 



12.3 Mitigation of Subcarrier Fading 

The advantage of multicarrier modulation is that each subchannel is relatively narrowband, which mitigates the 
effect of delay spread. However, each subchannel experiences flat-fading, which can cause large BERs on some 
of the subchannels. In particular, if the transmit power on subcarrier i is Pi, and the fading on that subcarrier 
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is ccj, then the received SNR is 7 i = af Pi/ (No Bm), where Bat is the bandwidth of each subchannel. If a,; is 
small then the received SNR on the zth subchannel is quite low, which can lead to a high BER on that subchannel. 
Moreover, in wireless channels the a/s will vary over time according to a given fading distribution, resulting in 
the same performance degradation associated with flat fading for single carrier systems discussed in Chapter 6 . 
Since flat fading can seriously degrade performance in each subchannel, it is important to compensate for flat 
fading in the subchannels. There arc several techniques for doing this, including coding with interleaving over 
time and frequency, frequency equalization, precoding, and adaptive loading, all described in subsequent sections. 
Coding with interleaving is the most common, and has been adopted as paid of the European standards for digital 
audio and video broadcasting [3, 4], Moreover, in rapidly changing channels it is difficult to estimate the channel 
at the receiver and feed this information back to the transmitter. Without channel information at the transmitter, 
precoding and adaptive loading cannot be done, so only coding with interleaving is effective at fading mitigation. 

12.3.1 Coding with Interleaving over Time and Frequency 

The basic idea in coding with interleaving over time and frequency is to encode data bits into codewords, interleave 
the resulting coded bits over both time and frequency, and then transmit the coded bits over different subchannels 
such that the coded bits within a given codeword all experience independent fading [19]. If most of the subchannels 
have a high SNR, the codeword will have most coded bits received correctly, and the errors associated with the few 
bad subchannels can be corrected. Coding across subchannels basically exploits the frequency diversity inherent to 
a multicarrier system to correct for errors. This technique only works well if there is sufficient frequency diversity 
across the total system bandwidth. If the coherence bandwidth of the channel is large, then the fading across 
subchannels will be highly correlated, which will significantly reduce the effect of coding. Most coding for OFDM 
assumes channel information in the decoder. Channel estimates are typically obtained by a two dimensional pilot 
symbol transmission over both time and frequency [ 20 ]. 

Note that coding with frequency/time interleaving takes advantage of the fact that the data on all the sub- 
carriers is associated with the same user, and can therefore be jointly processed. The other techniques for fading 
mitigation discussed in subsequent sections are all basically flat fading compensation techniques, which apply 
equally to multicarrier systems as well as narrowband flat fading single carrier systems [3, 2]. 

12.3.2 Frequency Equalization 

In frequency equalization the flat fading aq on the zth subchannel is basically inverted in the receiver [3]. Specif- 
ically, the received signal is multiplied by 1 / or*, which gives a resultant signal power a/ P,Jaq = Pi. While this 
removes the impact of flat fading on the signal, it enhances the noise. Specifically, the incoming noise signal 
is also multiplied by 1/au so the noise power becomes NqBn/ a? and the resultant SNR on the zth subchannel 
after frequency equalization is the same as before equalization. Therefore, frequency equalization does not really 
change the performance degradation associated with subcarrier flat fading. 

12.3.3 Precoding 

Precoding uses the same idea as frequency equalization, except that the fading is inverted at the transmitter instead 
of the receiver [21]. This technique requires that the transmitter have knowledge of the subchannel flat fading gains 
ai, i = 0, . . . , IV — 1, which must be obtained through estimation [22], In this case, if the desired received signal 
power in the zth subchannel is Pi, and the channel introduces a flat-fading gain at in the zth subchannel, then under 
precoding the power transmitted in the zth subchannel is P, / af . The subchannel signal is corrupted by flat-fading 
with gain a t , so the received signal power is P,a/ /a/ = Pi, as desired. Note that the channel inversion takes place 
at the transmitter instead of the receiver, so the noise power remains as NqBn. Precoding is quite common on 
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wireline multicarrier systems like HDSL. There are two main problems with precoding in a wireless setting. First, 
precoding is basically channel inversion, and we know from Section 6.3.5 that inversion is not power-efficient 
in fading channels. In fact, an infinite amount of power is needed to do channel inversion on a Rayleigh fading 
channel. The other problem with precoding is the need for accurate channel estimates at the transmitter, which arc 
difficult to obtain in a rapidly fading channel. 



12.3.4 Adaptive Loading 



Adaptive loading is based on the adaptive modulation techniques discussed in Chapter 9. It is commonly used 
on slowly changing channels like digital subscriber lines [8], where channel estimates at the transmitter can be 
obtained fairly easily. The basic idea is to vary the data rate and power assigned to each subchannel relative to 
that subchannel gain. As in the case of precoding, this requires knowledge of the subchannel fading {ai,i = 
0, . . . , N — 1} at the transmitter. In adaptive loading power and rate on each subchannel is adapted to maximize 
the total rate of the system using adaptive modulation such as variable-rate variable-power MQAM. 

Before investigating adaptive modulation, let us consider the capacity of the multicarrier system with N 
independent subchannels of bandwidth B N and subchannel gain { ay , i = 0 . . . . . N — 1}. Assuming a total power 
constraint P, this capacity is given by 2 : 



N-l 

C = max y Bn log 
A : E p i=P ^ 



i = 0 



i+AAV 

n 0 b n J 



(12.7) 



The power allocation P, that maximizes this expression is a water-filling over frequency given by Equation (4.24): 



P 1 

P 



70 

0 



k 7i - 70 

7 i < 70 



( 12 . 8 ) 



for some cutoff value 70, where 7 j = a 2 P j ( A'o Bn ) . The cutoff value is obtained by substituting the power 
adaptation formula into the power constraint. The capacity then becomes 



C= £ Bn log(7j/7o) . 



(12.9) 



Applying the variable-rate variable-power MQAM modulation scheme described in Chapter 9 to the subchan- 
nels, the total data rate is given by 



N 

R = BnT \og(l + KjjPj/P), (12.10) 

i= 1 

where K = — 1 .5/ ln(5Pft) for Pi, is the desired target BER in each subchannel. Optimizing this expression relative 
to the Pi s yields the optimal power allocation 



F\ = { H>1K 

P \ 0 7r < IK 



( 12 . 11 ) 



and corresponding data rate 

R = Bn J2 lo S(7r/ Ik ) , (12.12) 

where 7 k is a cutoff fade depth dictated by the power constraint P and K. 

'As discussed in Chapter 4.3.1, this summation is the exact capacity when the «iS are independent. However, in order for the ctiS to be 
independent, the subchannels must be separated by the coherence bandwidth of the channel, which would imply that the subchannels are 
no longer flat fading. Since the subchannels are designed to be flat fading, the subchannel gains {cti, i = 1, . . . , N} will be correlated, in 
which case the capacity obtained by summing over the capacity in each subchannel is an upper bound on the true capacity. We will take 
this bound to be the actual capacity, since in practice the bound is quite tight. 
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12.4 Discrete Implementation of Multicarrier 



Although multicarrier modulation was invented in the 1950’s, its requirement for separate modulators and de- 
modulators on each subchannel was far too complex for most system implementations at the time. However, the 
development of simple and cheap implementations of the discrete Fourier transform (DFT) and the inverse DFT 
(IDFT) twenty years later, combined with the realization that multicarrier modulation can be implemented with 
these algorithms, ignited its widespread use. In this section, after first reviewing the basic properties of the DFT, 
we illustrate OFDM, which implements multicarrier modulation using the DFT and IDFT 



12.4.1 The DFT and its Properties 

Let x[n ] , 0 < n < N — 1, denote a discrete time sequence. The Appoint DFT of x[n] is defined as [11] 



DFT{x[n]} = X[i] = 



1 

y/N 



N—l 

x[n\e~i n ; 

n = 0 



0 < i < AT — 1. 



(12.13) 



The DFT is the discrete-time equivalent to the continuous-time Fourier transform, as X[i] characterizes the fre- 
quency content of the time samples x[n] associated with the original signal x(t). Both the continuous-time Fourier 
transform and the DFT arc based on the fact that complex exponentials arc eigenfunctions for any linear system. 
The sequence x[n\ can be recovered from its DFT using the IDFT: 



IDFT{X[i]} = x[n\ = 



1 

y/N 



N - 1 

X — A r ~i „• 2irni 

yy X[z]e J n 

i = 0 



0 < n < N — 1. 



(12.14) 



The DFT and its inverse arc typically performed in hardware using the fast Fourier transform (FFT) and inverse 
FFT (IFFT). 

When an input data stream x[n] is sent through a linear time-invariant discrete-time channel h[n\, the output 
y[n\ is the discrete-time convolution of the input and the channel impulse response: 



y[n\ = h[n] * x[n] 



= x[n\ * h[n } = V h[k]x[n — kj. 



k 



(12.15) 



The Appoint circular convolution of x[n] and h[n\ is defined as 

y[n\ = x[n] <g> h[n\ = h[n\ <g> x[n] = y^ h[k]x[n — fc]jv, ( 12 . 16 ) 

k 

where [n — k] v denotes [n — k] modulo N. In other words, x[n — k] is a periodic version of x[ri — k] with period 
N. It is easily verified that y[n\ given by (12.16) is also periodic with period N. From the definition of the DFT, 
circular convolution in time leads to multiplication in frequency: 

DFT{y[n] = x[n\ <g) h[n]} = X[i]H[i], 0 < i < N — 1. (12.17) 

By (12.17), if the channel and input are circularly convoluted then if h[n\ is known at the receiver, the original 
data sequence x[n] can be recovered by taking the IDFT of Y[i]/H[i], 0 < i < N — 1. Unfortunately, the channel 
output is not a circular convolution but a linear convolution. However, the linear convolution between the channel 
input and impulse response can be turned into a circular convolution by adding a special prefix to the input called 
a cyclic prefix, described in the next section. 
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12.4.2 The Cyclic Prefix 

Consider a channel input sequence x[n\ = x[0], . . . , x[N — 1] of length N and a discrete-time channel with 
finite impulse response (FIR) h[n] = h [0] , . . . , h[y] of length y + 1 = T m /T S: where T m is the channel delay 
spread and T s the sampling time associated with the discrete time sequence. The cyclic prefix for x[n] is defined 
as {x[N — //],... ,x[N — 1]}: it consists of the last y values of the x[n] sequence. For each input sequence 
of length N, these last y samples arc appended to the beginning of the sequence. This yields a new sequence 
x[n\, ~n < n < N — l, of length N+y, where x[— y], . . . , x[N— 1] = x[N— y ], . . . , x[N — 1], x[0], . . . , x[N— 1], 
as shown in Figure 12.5. Note that with this definition, x[n] = x[n \ n for —y < n < N — 1, which implies that 
x[n — k] = x[n — &]jv for —y < n — k < N — 1. 



Cyclic prefix Original length N sequence 




x[N-iqx[N- |r+1]...x[N-1] 


x[0]x[1]...x[N-H -1] 


x[N-iqx[N-|J+1]...x[N-1] 


t 





Append last F symbols to beginning 



Figure 12.5: Cyclic Prefix of Length y. 

Suppose x[n] is input to a discrete-time channel with impulse response h[n\. The channel output y[n], 0 < 
n < N — 1 is then 



y[n\ = x[n\ * h[n\ 
m- i 

= h[k\x[n — k] 

k = o 

= h[k]x[n — At] jy 

fc =0 

= x[n]<g)h[n], (12.18) 

where the third equality follows from the fact that for 0 < k < n — 1, x[n — k] = x[n — k] jv for 0 < n < N — 1. 
Thus, by appending a cyclic prefix to the channel input, the linear - convolution associated with the channel impulse 
response y[n\ for 0 < n < N — 1 becomes a circular - convolution. Taking the DFT of the channel output in the 
absense of noise then yields 

Y[i] = DFT{y[n] = x[n\ <g> h[n]} = X[i]H[i], 0 < i < N — 1, (12.19) 

and the input sequence x[n],0 < n < N — 1, can be recovered from the channel output y[n \ , 0 < n < N — 1, for 
known h[n\ by 

x[n] = IDFT{y[*]/iT[i]} = IDFT{DFT{y[n]}/DFT{/t[ra]}}. (12.20) 

Note that y[n], —fi < n < N — 1, has length N + y, yet from (12.20) the first /j samples y[—/j], . . . , y[— 1] 
are not needed to recover x[n ] , 0 < n < N — 1, due to the redundancy associated with the cyclic prefix. Moreover, 

if we assume that the input x[n] is divided into data blocks of size N with a cyclic prefix appended to each block 

to form x[n\, then the first y samples of y[n] = h[n] * x[n] in a given block are corrupted by ISI associated with 
the last y samples of x[n] in the prior block, as illustrated in Figure 12.6. The cyclic prefix serves to eliminate ISI 
between the data blocks since the first y samples of the channel output affected by this ISI can be discarded without 
any loss relative to the original information sequence. In continuous time this is equivalent to using a guard band 
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of duration T m (the channel delay spread) after every block of N symbols of duration NT S to eliminate the ISI 
between these data blocks. 

The benefits of adding a cyclic prefix come at a cost. Since p symbols arc added to the input data blocks, 
there is an overhead of p/A\ resulting in a data rate reduction of A r /(p + N). The transmit power associated with 
sending the cyclic prefix is also wasted since this prefix consists of redundant data. It is clear from Figure 12.6 that 
any prefix of length p appended to input blocks of size N eliminates ISI between data blocks if the first p samples 
of the block arc discarded. In particular, the prefix can consist of all zero symbols, in which case although the data 
rate is still reduced by N/(N + p), no power is used in transmitting the prefix. Tradeoffs associated with the cyclic 
prefix versus this all-zero prefix, which is a form of vector coding, arc discussed in Section 12.9. 



Cyclic 
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Data 

Block 



y[0]...y[N— 1] 



Cyclic 

Prefix 



Data 

Block 



y[0]...y[N— 1 ] 



Cyclic 

Prefix 



Data 

Block 



y[0]...y[N— 1] 



Figure 12.6: ISI Between Data Blocks in Channel Output. 

The above analysis motivates the design of OFDM. In OFDM the input data is divided into blocks of size N 
referred to as an OFDM symbol. A cyclic prefix is added to each OFDM symbol to induce circular convolution 
of the input and channel impulse response. At the receiver, the output samples affected by ISI between OFDM 
symbols arc removed. The DFT of the remaining samples are used to recover the original input sequence. The 
details of this OFDM system design arc given in the next section. 



Example 12.4: Consider an OFDM system with total passband bandwidth B = 1 MHz assuming (3 = e = 0. 
A single carrier system would have symbol time T s = 1 / B = lps. The channel has a maximum delay spread 
of T m = 5 psec, so with T s = 1 psec and T m = 5 psec there would clearly be severe ISI. Assume an OFDM 
system with MQAM modulation applied to each subchannel. To keep the overhead small, the OFDM system 
uses N = 128 subcarriers to mitigate ISI. So Tjy = NT S = 128 psec. The length of the cyclic prefix is set to 
p = 8 > T m /T s to insure no ISI between OFDM symbols. For these parameters, find the subchannel bandwidth, 
the total transmission time associated with each OFDM symbol, the overhead of the cyclic prefix, and the data rate 
of the system assuming M = 16. 

Solution: The subchannel bandwidth /iy = 1 /Tv = 7.812 KHz, so /iy << B c = 1 /T rn = 200 KHz, insuring 
negligible ISI. The total transmission time for each OFDM symbol is T = Tjy + pT s = 128 + 8 = 136 ps. The 
overhead associated with the cyclic prefix is 8/136 which is roughly 5.9%. The system transmits log 2 16 = 4 
bits/subcarrier every T seconds, so the data rate is 128 x 4/136 x 10 -6 = 3.76 Mbps, which is slightly less than 
4 B due to the cyclic prefix overhead. 



12.4.3 Orthogonal Frequency Division Multiplexing (OFDM) 

The OFDM implementation of multicarrier modulation is shown in Figure 12.7. The input data stream is modulated 
by a QAM modulator, resulting in a complex symbol stream X[0], X[l ], . . . , X[N — 1]. This symbol stream is 
passed through a serial-to-parallel converter, whose output is a set of A r parallel QAM symbols X [0] , . . . , X [ N — 1] 
corresponding to the symbols transmitted over each of the subcarriers. Thus, the N symbols output from the 
serial-to-parallel converter are the discrete frequency components of the OFDM modulator output s(t). In order 
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Transmitter 




Receiver 

Figure 12.7: OFDM with IFFT/FFT Implementation. 



to generate s(t), these frequency components arc converted into time samples by performing an inverse DFT on 
these N symbols, which is efficiently implemented using the IFFT algorithm. The IFFT yields the OFDM symbol 
consisting of the sequence x[n] = x[0], . . . , x[N — 1] of length N, where 



x[n] 



N - 1 

— V X[i]e j2nni/N , 0<n<N-l. 



(12.21) 



This sequence coiTesponds to samples of the multicarrier signal: i.e. the multicarrier signal consists of linearly- 
modulated subchannels, and the right hand side of (12.21) coiTesponds to samples of a sum of QAM symbols X[i\ 
each modulated by carrier frequency e i 2mt / T N ^ = 0, . . . , IV — 1. The cyclic prefix is then added to the OFDM 
symbol, and the resulting time samples x[n] = x[—/j], . . . , x[N — 1] = x[N — /j], ... , x[0], . . . , a:[iV — 1 ] are 
ordered by the parallel-to-serial converter and passed through a D/A converter, resulting in the baseband OFDM 
signal x(t), which is then upconverted to frequency /o- 

The transmitted signal is filtered by the channel impulse response h(t) and corrupted by additive noise, so 
that the received signal is y(t) = xit) * h(t) + n(t). This signal is downconverted to baseband and filtered 



361 












to remove the high frequency components. The A/D converter samples the resulting signal to obtain y[n] = 
x[n] * h[n] + n[n], ~n < n < N — 1. The prefix of y[n] consisting of the first y samples is then removed. 
This results in N time samples whose DFT in the absence of noise is Y[i] = H[i]X[i\. These time samples 
are serial-to-parallel converted and passed through an FFT. This results in scaled versions of the original symbols 
H[i]X[i], where H\i\ = H (f)j is the flat-fading channel gain associated with the zth subchannel. The FFT output 
is parallel-to-serial converted and passed through a QAM demodulator to recover the original data. 

The OFDM system effectively decomposes the wideband channel into a set of narrowband orthogonal sub- 
channels with a different QAM symbol sent over each subchannel. Knowledge of the channel gains H[i],i = 
0, . . . , N — 1 is not needed for this decomposition, in the same way that a continuous time channel with frequency 
response H(f) can be divided into orthogonal subchannels without knowledge of H(f) by splitting the total signal 
bandwidth into nonoverlapping subbands. The demodulator can use the channel gains to recover the original QAM 
symbols by dividing out these gains: X[i\ = Y[i]/H[i\. This process is called frequency equalization. However, as 
discussed in Section 12.3.2 for continuous-time OFDM, frequency equalization leads to noise enhancement, since 
the noise in the zth subchannel is also scaled by 1 / H\i\. Hence, while the effect of flat fading on X[i] is removed 
by this equalization, its received SNR is unchanged. Precoding, adaptive loading, and coding across subchannels, 
as discussed in Section 12.3, are better approaches to mitigate the effects of flat fading across subcarriers. An al- 
ternative to using the cyclic prefix is to use a prefix consisting of all zero symbols. In this case the OFDM symbol 
consisting of x[n] , 0 < n < N — 1 is preceeded by // null samples, as illustrated in Figure 12.8. At the receiver the 
“tail” of the ISI associated with the end of a given OFDM symbol is added back in to the beginning of the symbol, 
which recreates the effect of a cyclic prefix, so the rest of the OFDM system functions as usual. This zero prefix 
reduces the transmit power relative to a cyclic prefix by since the prefix does not require any transmit power. 
However, the noise from the received tail is added back into the beginning of the symbol, which increases the noise 
power by . Thus, the difference in SNR is not significant for the two prefixes. 



Send Nothing in Guard Interval 




Figure 12.8: Creating a Circular Channel with an All-Zero Prefix. 



12.4.4 Matrix Representation of OFDM 

An alternate analysis for OFDM is based on a matrix representation of the system. Consider a discrete-time channel 
with FIR h[n],0 < n < n, input x[n], noise v [n ] , and output y[n] = x[n] * h[n ] + u[n\. Denote the nth element 
of these sequences as h n = h[n], x n = x[n], v n = u[n], and y n = y[n]. With this notation the channel output 
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sequence can be written in matrix form as 
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which can be written more compactly as 



y = Hx + v. 



(12.22) 



(12.23) 



The received symbols y_i . . . y_ /( arc discarded since they arc affected by ISI in the prior data block, and 
they arc not needed to recover the input. The last // symbols of x[n ] correspond to the cyclic prefix: X-\ = xn-i, 
X —2 = xn- 2 , ■ ■ ■ x -/< = xn-v From this it can be shown that the matrix representation (12.22) is equivalent to 
the following representation: 
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which can be written more compactly as 

y = Hx + v. 



(12.24) 



(12.25) 



This equivalent model shows that the inserted cyclic prefix allows the channel to be modelled as a circulant con- 
volution matrix H over the N samples of interest. The matrix H is N x N, so it has an eigenvalue decomposition 



H = MAM fl , 



(12.26) 



where A is a diagonal matrix of eigenvalues of H and M ;/ is a unitary matrix whose rows comprise the eigenvec- 
tors of H. 

It is straightforward to show that the DFT operation on x[n] can be represented by the matrix multiplication 



X = Qx, 



where X = (X[0], . . . , X[N — 1]) T , x = (x[0], . . . , x[N — 1]) T , and Q is an N x N matrix given by 



Q = 



l 

Vn 



i i i 

1 W N w% 

1 W ^- 1 w 2 N [N ~ l) 



for Wn = e J v . Since 



Q~ l = Q h , 



W: 



1 

N-l 



N 



W: 



(N-l ) 2 

N 



(12.27) 



(12.28) 
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the IDFT can be similarly represented as 



(12.29) 



x = Q _1 X = Q H X. 

Let v be an eigenvector of H with eigenvalue A. Then 

Av = Hv, 

The unitary matrix M w has rows that are the eigenvectors of H, i.e. AjmJ = Hrn/ for i = 0, 1, . . . , N — 1, 
where rn, denotes the ith row of M ;/ . It can also be shown by induction that the rows of the DFT matrix Q are 
eigenvectors of H, which implies that Q = M /; and Q ;/ = M. Thus we have that 

Y = Qy 

= Q [Hx + v\ 

= Q[HQ^X + v\ 

= Q[MAM H Q H X + v] (12.30) 

= QMAM^Q^X + Qi/ 

= M fl MAM ff MX + Q v 

= XX + vq (12.31) 

where since Q is unitaiy, vq = Qv has the same noise autocorrelation matrix as u, and hence is still generally 
white and Gaussian, with unchanged noise power. Thus, this matrix analysis also shows that by adding a cyclic 
prefix and using the IDFT/DFT, OFDM decomposes an ISI channel into N orthogonal subchannels and knowledge 
of the channel matrix H is not needed for this decomposition. 

The matrix representation is also useful in analyzing OFDM systems with multiple antennas. As discussed 
in Chapter 10, a MIMO channel is typically represented by an M r x M t matrix, where M t is the number of 
transmit antennas and M r the number of receive antennas. Thus, an OFDM-MIMO channel with N subchannels, 
M t transmit antennas, M r receive antennas, and a channel FIR of duration // can be represented as 

y = Hx + v, (12.32) 

where y is a vector of dimension M r N x 1 corresponding to N output time samples at each of the M r antennas, 
H is a NM r x ( N + n)Mt matrix corresponding to the N flat-fading subchannel gains on each transmit-receive 
antenna pair, and x is a vector of dimension Mt(N + //) x 1 corresponding to N input time samples with appended 
cyclic prefix of length /< at each of the M t transmit antennas. The matrix is in the same form as in the case 
of OFDM without multiple antennas, so the same design and analysis applies: with MIMO-OFDM the ISI is 
removed by breaking the wideband channel into many narrowband subchannels. Each subchannel experiences flat 
fading, so can be treated as a flat-fading MIMO channel. The capacity of this channel is obtained by applying the 
same matrix analysis as for standard MIMO to the augmented channel with MIMO and OFDM [16]. In discrete 
implementations the input associated with each transmit antenna is broken into blocks of size N with a cyclic 
prefix appended to convert linear convolution to circular and eliminate ISI between input blocks. More details can 
be found in [24]. 

12.4.5 Vector Coding 

In OFDM the N x N circular convolution channel matrix H is decomposed using its eigenvalues and eigenvectors. 
Vector coding (VC) is a similar technique whereby the original N x (N + //) channel matrix H from (12.23) is 
decomposed using an SVD, which can be applied to a matrix of any dimension. The SVD decomposition does not 
require a cyclic prefix to make the subchannels orthogonal, so it is more efficient than OFDM in terms of energy. 
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However, it is more complex, and requires knowledge of the channel impulse response for the decomposition, in 
contrast to OFDM, which does not require channel knowledge for its decomposition. 

The singular value decomposition of H can be written as 

H = USV ff , (12.33) 

where U is TV x TV unitary, V is (TV + /<) x (TV + /<) unitary, and X is a diagonal matrix whose Tth element a * is 
the Tth singular value of H. The singular values of H are related to the eigenvalues of HH Fl by a, = \f\ for A* 
the Tth eigenvalue of the matrix HH fi . Because H is a block-diagonal convolutional matrix, rank(H) = TV, i.e. 

<7j / 0 V T. 

In vector coding, as in OFDM, input data symbols are grouped into vectors of TV symbols. Let Xj denote the 
symbol to be transmitted over the Tth subchannel and X = (X o, . . . , X n - i ) denote a vector of these symbols. Each 
of the data symbols X, are multiplied by a column of V in parallel to form a vector, and then added together. At the 
receiver, the received vector Y is multiplied by each row of U n to yield TV output symbols, Y\, i = 0. 1. ..., TV — 1. 
This process is illustrated in Figure 12.9, where the multiplication with V and U 11 performs a similar function as 
the transmit precoding and receiver shaping in MIMO systems. 




Figure 12.9: Vector Coding. 

Mathematically, it can be seen that the filtered transmit and received vectors are 

x = VX 

and 

Y = U^y. (12.34) 

As a result, it can be shown through simple linear algebra that the filtered received vector Y is ISI-free, since 

Y = U^y 

= U^(Hx + u) 

= U ff (USV ff )VX + U H u 

= EX + U\ (12.35) 

Hence, each element of X is effectively passed through a scalar channel without ISI, where the scalar gain 
of subchannel T is the Tth singular value of H. Additionally, the new noise vector v = U 11 u has unchanged noise 
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variance, since U is unitary. The resulting received vector is thus 
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(12.36) 



From this analysis we see from (12.22) that the matrix H is obtained by appending // extra symbols to each 
block of N data symbols, which are called vector codewords. However, in constrast to OFDM, the SVD decom- 
position does not require these extra symbols to have any particular form, they are just inserted to eliminate ISI 
between blocks. In particular, these symbols need not be a cyclic prefix, nor must the “tail” be added back in if the 
prefix is all zeros. In practice the extra symbols are set to zero to save transmit power, thereby forming a guardband 
or null prefix between the vector codeword (VC) symbols, as shown in Figure 12.10. 



VC symbol 


guard 


VC symbol 


guard 


VC symbol 



Figure 12.10: Guard Interval (Null Prefix) in Vector Coding 

Vector coding has been proven using information and estimation theory to be the optimal partition of the 
jY-dimcnsional channel H. Thus, the capacity of any other channel partitioning scheme will be upper bounded 
by vector coding. Despite its theoretical optimality and ability to create ISI-free channels with relatively small 
overhead and no wasted transmit power, there are a number of important practical problems with vector coding. 
The two most important problems are: 

1 . Complexity. With vector coding, like in simple multichannel modulation, the complexity still scales quickly 
with N, the number of subcarriers. As seen from Figure 12.9, N transmit precoding and N receive shaping 
filters are required to implement vector coding. Furthermore, the complexity of finding the SVD of the 
N x (N + /i) matrix H increases rapidly with N. 

2. SVD and Channel Knowledge. In order to orthogonally partition the channel, the SVD of the channel 
matrix H must be computed. In particular, the precoding filter matrix must be known at the transmitter. 
This means that every time the channel changes, a new SVD must be computed, and the results conveyed to 
the transmitter. Generally, the computational complexity of the SVD and the delay incurred in getting the 
channel information back to the transmitter is prohibitive in wireless systems. Since OFDM can perform 
this decomposition without channel knowledge, OFDM is the method of choice for discrete multicarrier 
modulation in wireless applications. 



Example 12.5: Consider a simple two-tap discrete-time channel (i.e. p, = 1) described as: 

H(z) = 1 + 0.9V” 1 

Since p = T m /T s = 1, with N = 8 we insure liy ~ 1/ ( NT S ) << B c ~ 1/T C . Find the system matrix represen- 
tation (12.23) and the singular values of the associated channel matrix H. 



366 



Solution: The representation (12.23) for H(z) = 1 + 0.9 z 1 and N = 8 is given by 
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The singular values of the matrix H in (12.37) can be found by a standard computer package (e.g. Matlab) as 

S = diag(1.87, 1.78, 1.65, 1.46, 1.22, 0.95, 0.66, 0.34) 

The precoding and shaping matrices U and V arc also easily found. Given U, V, and S, this communication is 
ISI-free, with the symbols Xq, X\, ... , X[ _ \ being multiplied by the corresponding singular values as in (12.36). 



12.5 Challenges in Multicarrier Systems 

12.5.1 Peak to Average Power Ratio 

The peak to average power ratio (PAR) is a very important attribute of a communication system. A low PAR 
allows the transmit power amplifier to operate efficiently, whereas a high PAR forces the transmit power amplifier 
to have a large backoff in order to ensure linear amplification of the signal. This is demonstrated in Figure 12.11 
showing a typical power amplifier response. Operation in the linear region of this response is generally required 
to avoid signal distortion, so the peak value is constrained to be in this region. Clearly it would be desirable to 
have the average and peak values be as close together as possible in order to have the power amplifier operate at 
the maximum efficiency. Additionally, a high PAR requires high resolution for the receiver A/D convertor, since 
the dynamic range of the signal is much larger for high PAR signals. High resolution A/D conversion places a 
complexity and power burden on the receiver front end. 

The PAR of a continuous-time signal is given by 

p a P A max* |x(f)| 2 
Et[\x(t) | 2 ] ' 

and for a discrete-time signal it is given by 

^ max,, |x[n] | 2 
En[\x[n}\*} ' 

Any constant amplitude signal, e.g. a square wave, has PAR = 0 dB. A sine wave has PAR = 3 dB since 
max[sin 2 (//T)] = 1 and 

E[sm 2 (t/T)] = f sin 2 (t/T)cLt = .5, 

Jo 

so PAR=l/.5=2. 

In general PAR should be measured with respect to the continuous time signal using (12.38), since the input 
to the amplifier is an analog signal. The PAR given by (12.38) is sensitive to the pulse shape g(t) used in the 
modulation, and does not generally lead to simple analytical formulas [41]. For illustration we will focus on the 
PAR associated with the discrete-time signal, since it lends itself to a simple characterization. However, care 



(12.38) 



(12.39) 
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Figure 12.11: A typical power amplifier response. 



must be taken in interpreting these results, since by not taking into account the pulse shape g(t) they can be quite 
inaccurate. 

Consider the time domain samples output from the IFFT: 



x[n\ 



— L V X[i\e j *8r, 0 < n < N - 1. 



(12.40) 



If N is large, the Central Limit Theorem is applicable, and x [n] are zero-mean complex Gaussian random variables 
since the real and imaginary parts are summed. The Gaussian approximation for IFFT outputs is generally quite 
accurate for a reasonably large number of subcarriers (N > 64). For x[n] complex Gaussian, the envelope of the 
OFDM signal is Rayleigh distributed with variance a 2 , and the phase of the signal is uniform. Since the Rayleigh 
distribution has infinite support, the peak value of the signal will exceed any given value with nonzero probability. 
It can then be shown that the probability that the PAR given by (12.39) exceeds a threshold P 0 = a 2 /a 2 is given 
by [40] 

p(PAR > P 0 ) = 1 - (1 - e~ p °) N . (12.41) 

Let us now investigate how PAR grows with the number of subcarriers. Consider N Gaussian i.i.d. random 
variables x n , 0 < n < N — 1 with mean zero and unit power. The average signal power £7 n [|x[n] | 2 ] is then 
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The maximum value occurs when all the x,s add coherently, in which case 
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Hence the maximum PAR is N for N subcaiTiers. In practice full coherent addition of all N symbols is highly 
improbable, so the observed PAR is typically less than N, usually by many dB. Nevertheless, PAR increases 
approximately linearly with the number of subcarriers. So, although it is desirable to have N as large as possible 
in order to keep the overhead associated with the cyclic prefix down, a large PAR is an important penalty that must 
be paid for large N. 

There arc a number of ways to reduce or tolerate the PAR of OFDM signals, including clipping the OFDM 
signal above some threshold, peak cancellation with a complementary signal, allowing non-linear distortion from 
the power amplifier (and correction for it), and special coding techniques [31]. A good summary of some of these 
techniques can be found in [38]. 

12.5.2 Frequency and Timing Offset 

OFDM modulation encodes the data symbols Xi onto orthogonal subchannels, where orthogonality is assured 
by the subcarrier separation A / = 1/Tjy. The subchannels may overlap in the frequency domain, as shown 
in Figure 12.12 for a rectangular pulse shape in time (sine function in frequency). In practice, the frequency 
separation of the subcarriers is imperfect: so A / is not exactly equal to 1 /Ty. This is generally caused by 
mismatched oscillators, Doppler frequency shifts, or timing synchronization errors. For example, if the carrier 
frequency oscillator is accurate to 0.1 parts per million (ppm), the frequency offset A f e ~ (/o)(0.1 x 1 0 6 ) . If 
/o = 5 GHz, the carrier frequency for 802. 1 la WLANs, then A f e = 500 Hz, which will degrade the orthogonality 
of the subchannels, since now the received samples of the FFT will contain interference from adjacent subchannels. 
We’ll now analyze this intercarrier interference (ICI). 




Frequency (Hz) 



Figure 12.12: OFDM Overlapping SubcaiTiers: Rectangular Pulses, fo = 10 Hz and A / = 1 Hz. 

The signal corresponding to subcarrier i can be simply expressed for the case of rectangular pulse shapes 
(suppressing the data symbol and the carrier frequency) as 



Xi(t) = e 3 Tv 



(12.44) 
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An interfering subchannel signal can be written as 

• 2-7T 

Xi+m(t ) = e J t n . (12.45) 

If the signal is demoduled with a frequency offset of S/T n then this interference becomes 

• 27r(i+m+5)t 

X i+m (t) = e J t n . (12.46) 

The ICI between subchannel signals x % and x l+rn is simply the inner product between them: 

T f T n (l _ e -i 27r ( 5 + m )) 

I m = Xi(t)xi +m (t)dt = — -—7 . (12.47) 

J j2n{m + 5) 

o 

It can be seen that in the above expression, 6 = 0 =>• I m = 0, as expected. The total ICI power on subcarrier i is 
then 

ICIi = ]T \Im\ 2 ~ C 0 (T n S) 2 , (12.48) 

m^i 

where Co is some constant. Several important trends can be observed from this simple approximation. First, as Tn 
increases, the subcarriers grow narrower and hence more closely spaced, which then results in more ICI. Second, 
the ICI predictably grows with the frequency offset 5, and the growth is about quadratic. Another interesting 
observation is that (12.48) does not appeal - to be directly effected by N. But picking N large generally forces T \r 
to also be large, which then causes the subcarriers to be closer together. Along with the larger PAR that comes with 
large N, the increased ICI is another reason to pick N as low as possible, given that the overhead budget can be 
met. In order to further reduce the ICI for a given choice of N, non-rectangular windows can also be used [30, 33]. 

The effects from timing offset are generally less than those from the frequency offset, as long as a full N 
sample OFDM symbol is used at the receiver, without interference from the previous or subsequent OFDM symbols 
(this is ensured by taking the cyclic prefix length /j, >> aT m /T s , where ar. m is the channel’s rms delay spread). It 
can be shown that the ICI power on subcarrier i due to a receiver timing offset r can be approximated as 2 (r/T at) 2 . 
Since usually r«Tjv, this effect is typically negligible. 

12.6 Case Study: The IEEE 802.11a Wireless LAN Standard 

The IEEE 802. 11a Wireless LAN standard, which occupies 20 MHz of bandwidth in the 5 GHz unlicensed band, 
is based on OFDM [26]. The IEEE 802. 1 lg standard is virtually identical, but operates in the smaller and more 
crowded 2.4 GHz unlicensed ISM band [28]. In this section we study the properties of this OFDM design and 
discuss some of the design choices. 

In 802.1 la, N = 64 subcarriers are generated, although only 48 are actually used for data transmission, with 
the outer 12 zeroed in order to reduce adjacent channel interference, and 4 used as pilot symbols for channel esti- 
mation. The cyclic prefix consists of /x = 16 samples, so the total number of samples associated with each OFDM 
symbol, including both data samples and the cyclic prefix, is 80. The transmitter gets periodic feedback from the 
receiver about the packet error rate, which it uses to pick an appropriate error correction code and modulation tech- 
nique. The same code and modulation must be used for all the subcarriers at any given time. The error correction 
code is a convolutional code with one of three possible coding rates: r = |, or |. The modulation types that 

can be used on the subchannels are BPSK, QPSK, 16QAM, or 64QAM. 
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Since the bandwidth B (and sampling rate 1 /T s ) is 20 MHz, and there are 64 subcarriers evenly spaced over 
that bandwidth, the subcarrier bandwidth is: 

„ 20 MHz 

Bn = — = 312.5 KHz. 

64 

Since // = 16 and 1/T S = 20MHz, the maximum delay spread for which ISI is removed is 



T m < liT s = 



16 



20MHz 



= 0.8 /rsec, 



which corresponds to delay spread in an indoor environment. Including both the OFDM symbol and cyclic prefix, 
there are 80=64+16 samples per OFDM symbol time, so the symbol time per subchannel is 



T/v = 80 T s = 



80 



20 x 10 6 



= 4 /is 



The data rate per subchannel is log 2 M/T/v- Thus, the minimum data rate for this system, corresponding to BPSK 
(1 bit/symbol), an r = ^ code, and taking into account that only 48 subcarriers actually carry usable data, is given 

by 



R 



. 0 , . 1/2 bit 1 coded bit 1 subcarrier symbol 

. min = 48 subcarners x — 7 — x — T : T — 7 x 



codedbit subcarrier symbol 4x10 6 seconds 



= 6 Mbps 



(12.49) 



The maximum data rate that can be transmitted is 

„ . 3/4 bit 6 coded bits 1 subcarrier symbol , , ,, 

Rmax = 48 subcarriers x — - — — — x — — - — — ^ — = 54 Mbps. (12.50) 

codedbit subcarrier symbol 4x10 “seconds 

Naturally, a wide range of data rates between these two extremes is possible. 



Example 12.6: Find the data rate of an 802.1 la system assuming 16QAM modulation and rate 2/3 coding. 



Solution: With 16QAM modulation each subcarrier transmits log 2 (16) = 4 coded bits per subcarrier symbol and 
there arc a total of 48 subcarriers used for data transmission. With a rate 2/3 code, each coded bit relays 2/3 of an 
information bit per Tjy seconds. Thus, the data rate is given by 



2/3 bit 4 coded bits 1 subcarrier symbol 

Rmax = 48 subcarners x — - — — — - — — : — - — 

coded bit subcarrier symbol 4x10 “seconds 



32 Mbps. 



(12.51) 
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Chapter 12 Problems 

1. Show that the minimum separation for subcarriers {cos(27 rj/Tjv + = 1,2 . . .} to form a set of or- 

thonormal basis functions on the interval [0, T n ] is 1 /Tjv for any initial phase o r Show that if <pj = OVj 
then this carrier separation can be reduced by half. 

2. Consider an OFDM system operating in a channel with coherence bandwidth B c = 10 KHz. 

(a) Find a subchannel symbol time T n = 1 / /iy = 10T m , assuming T m = 1/B C . This should insure 
flat-fading on the suchannels. 

(b) Assume the system has N = 128 subchannels. If raised cosine pulses with (3 = 1.5 are used, and 
the required additional bandwidth due to time limiting to insure minimal power outside the signal 
bandwidth is e = . 1 , what is the total bandwidth of the system? 

(c) Find the total required bandwidth of the system using overlapping carriers separated by 1 /TV, and 
compare with your answer in part (c). 

3. Show from the definition of the DFT that circular convolution of discrete-time sequences leads to multipli- 
cation of their DFTs. 

4. Consider a high-speed data signal with bandwidth .5 MHz and a data rate of .5 Mbps. The signal is trans- 
mitted over a wireless channel with a delay spread of 10 //sec. 

(a) If multicarrier modulation with nonoverlapping subchannels is used to mitigate the effects of 1ST ap- 
proximately how many subcarriers are needed? What is the data rate and symbol time on each subcar- 
rier? (We do not need to eliminate the ISI completely. So T s = T m is enough) 

Assume for the remainder of the problem that the average received SNR ( 7 .,) on the nth subcarrier is 
1000/?/ (lineal - units) and that each subcarrier experiences flat Rayleigh fading (so ISI is completely 
eliminated). 

(b) Suppose BPSK modulation is used for each subcarrier. If a repetition code is used across all subcarriers 
(i.e. a copy of each bit is sent over each subcarrier) then what is the BER after majority decoding? What 
is the data rate of the system? 

(c) Suppose you use adaptive loading (i.e. use different constellations on each subcarrier) such that the 
average BER on each subcarrier does not exceed 10 -3 (this is averaged over the fading distribution, do 
not assume that the TX and RX adapt power or rate to the instantaneous fade values). Find the MQAM 
constellation that can be transmitted over each subcarrier while meeting this average BER target. What 
is the total data rate of the system with adaptive loading? 

5. Consider a multicarrier modulation transmission scheme with three nonoverlapping subchannels spaced 200 
KHz apart (from carrier to carrier) with subchannel baseband bandwidth of 100 KHz. 

(a) For what values of the channel coherence bandwidth will the subchannels of your multicarrier scheme 
exhibit flat-fading (approximately no ISI)? For what values of the channel coherence bandwidth will the 
subcarriers of your multicarrier scheme exhibit independent fading? If the subcarriers exhibit correlated 
fading, what impact will this have on coding across subchannels? 

(b) Suppose you have a total transmit power P = 300 mW, and the noise power in each subchannel 
is 1 mW. With equal power of 100 mW transmitted on each subchannel, the received SNR on each 
subchannel is 71 = 11 dB, 72 = 14 dB, and 73 = 18 dB. Assume the subchannels do not experience 
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fading, so these SNRs are constant. For these received SNRs find the maximum signal constellation 
size for MQAM that can be transmitted over each subchannel for a target BER of 10 3 . Assume the 
MQAM constellation is restricted to be a power of 2 and use the BER bound BER < ,2e ' 1 -5-y/f n/— l j 
for your calculations. What is the corresponding total data rate of the multicarrier signal, assuming a 
symbol rate on each subchannel of T s = l//i, where B is the baseband subchannel bandwidth? 

(c) For the subchannel SNRs given in paid (b), suppose we want to use precoding to equalize the received 
SNR in each subchannel and then send the same signal constellation over each subchannel. What size 
signal constellation is needed to achieve the same data rate as in paid (b)? What transmit power would 
be needed on each subchannel to achieve the required received SNR for this constellation with a 10 ~ 3 
BER target? How much must the total transmit power be increased over the 300 mW transmit power 
in part (b)? 

6. Consider a channel with impulse response 



h(t) = ao5(t) + ai S(t - Ti) + a 2 S(t - T 2 ). 

Assume that T\ = 10 //secs and T 2 = 20 //secs. You want to design a multicarrier system for the channel, 
with subchannel bandwidth = B c / 2. If raised cosine pulses with 3 = 1 arc used, and the subcarriers 
arc separated by the minimum bandwidth necessary to remain orthogonal, then what is the total bandwidth 
occupied by a multicarrier system with 8 subcarriers? Assuming a constant SNR on each subchannel of 20 
dB, what is the maximum constellation size for MQAM modulation that can be sent over each subchannel 
with a target BER of 10 3 , assuming M is restricted to be a power of 2. Also find the corresponding total 
data rate of the system. 



7. Show that the matrix representations and (12.22) and (12.24) for the DMT system with a cyclic prefix 
appended to the input arc equivalent. 

8. Show that the DFT operation on x[n ] can be represented by the matrix multiplication X[i] = Qx[n] where 

1 



Q = 



1 



Vn 



i l 
1 W N 



W 2 n 



w 



1 

N—l 



N 



1 w^~ l w 2 n {n l) ... 1)2 



(12.52) 



for Wn = e i n . 



9. This problem shows that the rows of the DFT matrix Q are eigenvectors of H. 

(a) Show that the first row of Q is an eigenvector of H with eigenvalue Ao = Yli = o 

(b) Show that the second row of Q is an eigenvector of H with eigenvalue Ai = Yli=o ^ iW l N 

(c) Argue by induction that similar relations hold for all rows of Q. 



10. Show that appending the all-zero prefix to an OFDM symbol and then adding in the tail of the received 
sequence, as shown in Figure 12.8, results in the same received sequence as with a cyclic prefix. 



11. Show that the two matrix representations of the DMT given by (12.22) and (12.24), arc equivalent. 

12. Consider a discrete-time FIR channel with h[n] = .7 + .5<5[n — 1] + ,3<5[n — 3]. Consider an OFDM system 
with N = 8 subchannels. 
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(a) Find the matrix H corresponding to the matrix representation of DMT y = Hx + v given in (12.23). 

(b) Find the circulant convolution matrix H corresponding to the matrix representation in (12.25), as well 
as its eigenvalue decomposition H = MAM fl . 

(c) What arc the flat-fading channel gains associated with each subchannel in the representation of paid 

(b)? 

13. Consider a five-tap discrete-time channel 

H(z) = 1 + 0.6;?- 1 + .7z~ 2 + ,3z“ 3 + ,2z“ 4 

Assume this channel model characterizes the maximum delay spread of the channel. Assume a VC system 
is used over this channel with N = 256 carriers. 

(a) What value of // is needed for the prefix to eliminate ISI between VC symbols. What is the overhead 
associated with this /_/. 

(b) Find the system matrix representation (12.23) and the singular values of the associated channel matrix 

H. 

(c) Find the transmit precoding and shaping matrices, V and U H , required to orthogonalize the subchan- 
nels. 

14. Suppose the 4 subchannels in 802. 11a used for pilot estimation could be used for data transmission by taking 
advantage of blind estimation techniques. What maximum and minimum data rates could be achieved by 
including these extra subchannels, assuming the same modulation and coding formats arc available. 

15. Find the data rate of an 802.11a system assuming half the available 48 subchannels use BPSK with a rate 
1/2 channel code and the others use 64QAM with a rate 3/4 channel code. 

16. Find the PAR of a raised cosine pulse with f3 = 0, 1, 2. Which pulse shape has the lowest PAR? Is this pulse 
shape more or less sensitive to timing errors? 

17. Find the constant Cq associated with intercarrier interference in (12.48). 
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Chapter 13 

Spread Spectrum 



Although bandwidth is a valuable commodity in wireless systems, increasing the transmit signal bandwidth can 
sometimes improve performance. Spread spectrum is a technique that increases signal bandwidth beyond the 
minimum necessary for data communication. There arc many reasons to do this. Spread spectrum techniques can 
hide a signal below the noise floor, making it difficult to detect. Spread spectrum also mitigates the performance 
degradation due to ISI and narrowband interference. In conjunction with a RAKE receiver, spread spectrum can 
provide coherent combining of different multipath components. Spread spectrum also allows multiple users to 
share the same signal bandwidth, since spread signals can be superimposed on top of each other and demodulated 
with minimal interference between them. Finally, the wide bandwidth of spread spectrum signals is useful for 
location and timing acquisition. 

Spread spectrum first achieved widespread use in military applications due to its inherent property of hiding 
the spread signal below the noise floor during transmission, its resistance to narrowband jamming and interference, 
and its low probability of detection and interception. For commercial applications, the narrowband interference 
resistance has made spread spectrum common in cordless phones. The ISI rejection and bandwidth sharing capa- 
bilities of spread spectrum are very desirable in cellular systems and wireless FANs. As a result, spread spectrum 
is the basis for both 2nd and 3rd generation cellular systems as well as 2nd generation wireless FANs. 

13.1 Spread Spectrum Principles 

Spread spectrum is a modulation method applied to digitally modulated signals that increases the transmit signal 
bandwidth to a value much larger than is needed to transmit the underlying information bits. There arc many 
signaling techniques that increase the transmit bandwidth above the minimum required for data transmission, 
for example coding and frequency modulation. However, these techniques do not fall in the category of spread 
spectrum. The following three properties arc needed for a signal to be spread spectrum modulated [1]: 

• The signal occupies a bandwidth much larger than is needed for the information signal. 

• The spread spectrum modulation is done using a spreading code, which is independent of the data in the 
signal. 

• Despreading at the receiver is done by correlating the received signal with a synchronized copy of the spread- 
ing code. 

To make these notions precise, we return to the signal space representation of Chapter 5.1 to investigate 
embedding an information signal of bandwidth B into much larger bandwidth B s than is needed. From (5.3), a 
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set of lineai'ly independent signals Si(t), i = 1, . . . , M of bandwidth B and time duration T can be written using a 
basis function representation as 

N 

Si(t) = ^2 Sij<t>j{t ), 0 <t <T, (13.1) 

i=t 

where the basis functions 4>j(t) arc orthonormal and span an A'-dimensional space. One of these signals is trans- 
mitted every T seconds to convey log 2 M/T bits per second. As discussed in Chapter 5.1.2. the minimum number 
of basis functions needed to represent these signals is M ~ 2 BT. Hence, to embed these signals into a larger 
dimensional space, we chose N » M. The receiver uses an M branch structure where the /th branch correlates 
the received signal with Si(t). The receiver outputs the signal corresponding to the branch with the maximum 
correlator output. 

Suppose we generate the signals Si(t) using random sequences, so that the sequence of coefficients Sij are 
chosen based on a random sequence generation where each coefficient has mean zero and variance E s /N. Thus, 
the signals s t (t) will have their energies uniformly distributed over the signal space of dimension N. Consider an 
interference or jamming signal within this signal space. This signal can be represented as 

N 

= (13.2) 

3 = 1 



with total energy over [0, T] given by 




N 

I 2 (t)dt = J2 I j= E J- 

3 = 1 



(13.3) 



Suppose the signal Si(t) is transmitted. Neglecting noise, the received signal is the sum of the transmitted signal 
plus interference: 

x(t) = Si(t) + I(t). (13.4) 



The output of the correlator in the /th branch of the receiver is then 



Xi = 




N 

x{t)Si(t)dt = 22^3 + I 3 S ii)i 
3 = 1 



(13.5) 



where the first term in this expression represents the signal and second term the interference. It can be shown [1] 
that the signal-to-interference (SIR) power ratio of this signal is 

F AT 

SIR = — x — . (13.6) 

Ej M 

This result is independent of the distribution of the interferer’s energy over the A’ -dimensional signal space. In 
other words, by spreading the interference power over a larger dimension N than the required signaling dimension 
M, the SIR is increased by Cl = N/M, where G is called the processing gain. In practice spread spectrum 
systems have processing gains on the order of 100-1000. Since N s=s 2 B S T and M ~ 2 BT, we have G ~ B s /B , 
the ratio of the spread signal bandwidth to the information signal bandwidth. Processing gain is often defined 
as this bandwidth ratio or something similar, but its underlying meaning is generally related to the performance 
improvement of a spread spectrum system relative to a non-spread system in the presence of interference [2], 
Note that block and convolution coding are also techniques that improve performance in the presence of noise or 
interference by increasing signal bandwidth. An interesting tradeoff arises as to whether, given a specific spreading 
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bandwidth, it is more beneficial to use coding or spread spectrum. The answer depends on the specifics of the 
system design [4], 

Spread spectrum is typically implemented in one of two forms: direct sequence (DS) or frequency hopping 
(FH). In direct sequence spread spectrum (DSSS) modulation, the modulated data signal s(t) is multiplied by 
a wideband spreading signal or code s c (t), where s c (t) is constant over a time duration T c and has amplitude 
equal to 1 or -1. The spreading code bits are usually referred to as chips, and 1/T C is called the chip rate. The 
bandwidth B c ~ 1/T C of s c (t) is roughly B c /B ~ T s /T c times bigger than the bandwidth B of the modulated 
signal s(t), and the number of chips per bit, T s /T c , is an integer approximately equal to G, the processing gain of 
the system. Multiplying the modulated signal by the spreading signal results in the convolution of these two signals 
in the frequency domain. Thus, the transmitted signal s(t)s c (t) has frequency response S(f) * S c (f), which has 
a bandwidth of roughly B c + B. The multiplication of a spreading signal with a BPS K- modulated data signal is 
illustrated in Figure 13.1. 
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Figure 13.1: Spreading Signal Multiplication 

For an AWGN channel the received spread signal is s(t)s c (t) + n(t). If the receiver multiplies this sig- 
nal by a synchronized replica of the spreading signal, this yields s(t)s^(t) + n(t)s c (t). Since s c (t) = ±1, 
s^(t) = 1. Moreover n'(t) = n(t)s c (t ) has approximately the same statistics as n(t) if s c (t) is zero mean 
and sufficiently wideband (i.e. its autocorrelation approximates a delta function). Thus, the received signal is 
s(t)sc(t) + n(t)s c (t ) = s(t) + n'{t), indicating that spreading and despreading have no impact on signals trans- 
mitted over AWGN channels. However, spreading and despreading have tremendous benefits when the channel 
introduces narrowband interference or 1ST 

We now illustrate the narrowband interference and multipath rejection properties of direct sequence spread 
spectrum (DSSS) in the frequency domain: more details will be given in later sections. We first consider narrow- 
band interference rejection, as shown in Figure 13.2. Neglecting noise, we see that the receiver input consists of 
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the spread modulated signal S(f) * S c (f) and the narrowband interference 1(f). The despreading in the receiver 
recovers the data signal S(f). However, the interference signal I(t) is multiplied by the spreading signal s c (t), 
resulting in their convolution 1(f) * S c (f) in the frequency domain. Thus, receiver despreading has the effect 
of distributing the interference power over the bandwidth of the spreading code. The demodulation of the modu- 
lated signal s(t) effectively acts as a lowpass filter, removing most of the energy of the spread interference, which 
reduces its power by the processing gain G & B c /B. 



S(f) 1(f) S(f) 




Modulated Signal Receiver Input Despread Signal 



Figure 13.2: Narrowband Interference Rejection in DSSS. 

ISI rejection, illustrated in Figure 13.3, is based on a si mi lar premise. Suppose the spread signal s(t)s c (t) 
is transmitted through a two-path channel with impulse response h(t) = a6(t) + /36(t — r). Then H(f) = 
a + /3e _j27r ^ T , resulting in a receiver input in the absence of noise equal to H(f)[S(f) * S c (f)] in the frequency 
domain or [s(t)s c (t)]*h(t) = as(t)s c (t)+(3s(t—T)s c (t—T) in the time domain. Suppose the receiver despreading 
process multiplies this signal by a copy of s c (t) synchronized to the first path of this two path model. This 
results in the time domain signal as(t)s^.(t) + /3s(t — r)s c (t — r)s c (t). Since the second multipath component 
/ 3s' (t ) = (3s(t — r)s c (t — r)s c (t) includes the product of asynchronized copies of s c (t), it remains spread out 
over the spreading code bandwidth, and the demodulation process will remove most of its energy. More precisely, 
as described in Section 13.2, the demodulation process effectively attenuates the multipath component by the 
autocorrelation p c (r) of the spreading code at delay r. This autocorrelation can be quite small when r > T c , on 
the order of 1 /G ~ T c /T s , resulting in significant mitigation of the ISI when the modulated signal is spread over 
a wide bandwidth. Since the spreading code autocorrelation determines the ISI rejection of the spread spectrum 
system, it is important to use spreading codes with good autocorrelation properties. The tradeoffs in spreading 
code designs are discussed in the next section. 



S(f) 




Modulated Signal 



S(f)*S c (f) [a+p e -i27m ] 



Receiver Input 



aS(f) 




Despread Signal 



Figure 13.3: ISI Rejection in DSSS. 



The basic premise of frequency hopping spread spectrum (FHSS) is to hop the modulated data signal over a 
wide bandwidth by changing its carrier frequency according to a spreading code s c (t). 1 This process is illustrated 
in Figure 13.4. The chip time T c dictates the time between hops, i.e. the time duration over which the modulated 
data signal is centered at a given carrier frequency f , before hopping to a new carrier frequency. The hop time can 
exceed a symbol time, T c = kT s for some integer k, which is called slow frequency hopping (SFH), or the carrier 
can be changed multiple times per symbol, T c = T s /k for some integer k, which is called fast frequency hopping 

'The concept of frequency-hopping was invented during World War II by the film star Hedy Lamarr and the composer George Antheil. 
Their patent for a “Secret Communications System” used a chip sequence generated by a player piano roll to hop between 88 frequencies. 
The design was intended to make radio-guided torpedos hard to detect or jam. 
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(FFH). In FFH there is frequency diversity on every symbol, which protects each symbol against narrowband 
interference and spectral nulls due to frequency-selective fading. The bandwidth of the FH system is approximately 
equal to NB, where N is the number of carrier frequencies available for hopping and B is the bandwidth of the 
data signal. The signal is generated using a frequency synthesizer that determines the modulating carrier frequency 
from the chip sequence, typically using a form of FM modulation such as CPFSK. In the receiver, the signal is 
demodulated using a similar frequency synthesizer, synchronized to the chip sequence s c (t), that generates the 
sequence of carrier frequencies from this chip sequence for downconversion. As with DS, FH has no impact on 
performance in an AW GN channel. However, it does mitigate the effects of narrowband interference and multipath. 




Figure 13.4: Frequency Hopping. 

Consider a narrowband interferer of bandwidth B at a carrier frequency /, corresponding to one of the carriers 
used by the FH system. The interferer and FH signal occupy the same bandwidth only when carrier f , is generated 
by the hop sequence. If the hop sequence spends an equal amount of time at each of the carrier frequencies, 
then interference occurs a fraction 1/N of the time, and thus the interference power is reduced by roughly 1/N. 
However, the nature of the interference reduction is different in FH versus DS systems. In particular, DS results in 
a reduced-power interference all the time, whereas FHSS has a full power interferer a fraction of the time. In FFH 
systems the interference affects only a fraction of a symbol time, so coding may not be required to compensate for 
this interference. In SFH systems the interference affects many symbols, so typically coding with interleaving is 
needed to avoid many simultaneous errors in a single codeword. FH is commonly used in military systems, where 
the interferes arc assumed to be malicious jammers attempting to disrupt communications. 

We now investigate the impact of multipath on an FH system. For simplicity, we consider a two-path channel 
that introduces a multipath component with delay r. Suppose the receiver synchronizes to the hop sequence 
associated with the LOS signal path. Then the LOS path is demodulated at the desired carrier frequency. However, 
the multipath component arrives at the receiver with a delay r. If r > T c then the receiver will have hopped to a 
new carrier frequency f 3 / f t for downconversion when the multipath component, centered at carrier frequency 
fi, arrives at the receiver. Since the multipath occupies a different frequency band than the LOS signal component 
being demodulated, it causes negligible interference to the demodulated signal. Thus, the demodulated signal does 
not exhibit either flat or frequency-selective fading for r > T c . If t < T c then the impact of multipath depends 
on the bandwidth B of the modulated data signal as well as the hop rate. First consider an FFH system where 
T c « T s . Since we also assume r < T c , we have t <T C « T s . Since all the multipath arrives within a symbol 
time, the multipath introduces a complex amplitude gain and the signal experiences flat fading. Now consider a 
SFH system where T c » T s . Since we also assume r < T c , all the multipath will arrive while the signal is at the 
same carrier frequency, so the impact of multipath is the same as if there were no frequency hopping: For B < 1/r 
the signal experiences flat fading, and for B > 1/r the signal experiences frequency-selective fading. The fading 
channel also varies slowly over time, since the baseband equivalent channel changes whenever the carrier hops 
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to a new frequency. In summary, frequency hopping removes the impact of multipath on demodulation of the 
LOS component whenever r > T c . For r < T c , an FFH system will exhibit flat fading, and an SFH system 
will exhibit slowly varying flat fading for B < 1/r and slowly varying frequency-selective fading for B > 1/r. 
The performance analysis under time-varying flat or frequency-selective fading is the same as for systems without 
hopping, as given in Chapter 6.3 and Chapter 6.5, respectively. 

In addition to their interference and ISI rejection capabilities, both DSSS and FHSS provide a mechanism 
for multiple access, allowing many users to simultaneously share the spread bandwidth with minimal interference 
between users. In these multiuser systems, the interference between users is determined by the cross-correlation 
of their spreading codes. Spreading code designs typically have either good autocorrelation properties to mitigate 
ISI or good cross-correlation properties to mitigate multiuser interference. However, there is usually a tradeoff 
between optimizing the autocorrelation and optimizing the cross-correlation. Thus, the best choice of code design 
depends on the number of users in the system and the severity of the multipath and interference. Frequency hopping 
has some benefits over direct sequence in multiuser systems, and is also used in cellular systems to average out 
interference from other cells. 



Example 13.1: Consider an SFH system with hop time T c = 10 //sec and symbol time T s = 1 // sec. If the FH 
signal is transmitted over a multipath channel, for approximately what range of multipath delay spreads will the 
received despread signal exhibit frequency-selective fading? 

Solution: Based on the two-path model analysis, the signal only exhibits fading, flat or frequency-selective, when 
the delay spread t < T c = 10 /rsec. Moreover, for frequency-selective fading we require B ~ l/T s = 10 6 > 1/r, 
i.e. we require r > 10“ fi = 1 //sec. So the despread signal will exhibit frequency-selective fading for delay spreads 
ranging from approximately 1 to 10 //see. 



13.2 Direct Sequence Spread Spectrum (DSSS) 

13.2.1 DSSS System Model 

An end-to-end direct sequence spread spectrum system is illustrated in Figure 13.5. The multiplication by s c (t) 
and the carrier cos(27r/ c f) could be done in opposite order as well: downconverting prior to despreading allows the 
code synchronization and despreading to be done digitally, but complicates carrier phase tracking since it must be 
done relative to the wideband spread signal 2 . For simplicity we only illustrate the receiver for in-phase signaling, 
a similar structure is used for the quadrature signal component. The data symbols si are first linearly modulated 
to form the baseband modulated signal x(t) = Yli s l9(t ~ ZT S ), where g(t) is the modulator shaping pulse, T s is 

2 A sytem where spreading and despreading on the bandpass modulated signal would work as follows. The transmitter would consist of 
a standard narrowband modulator that would generate a passband modulated signal, followed by spreading. The receiver would consist of 
despreading, followed by a standard narrowband demodulator. This order of operations makes it straightforward to design a spread spectrum 
system using existing narrowband modulators and demodulators, and the operations such as carrier phase recovery would not be affected 
by spreading. However, spread spectrum systems today do as much of the signal processing as possible in the digital domain. Thus, spread 
spectrum systems typically modulate the data symbols and multiply by the spreading code at baseband using digital signal processing, 
followed by A/D conversion and analog upconversion to the carrier frequency. In this case all functions prior to the carrier multiplication 
in Figure 13.5 would be done digitally, and there would be an A/D converter following the multiplication with s c (t). However, the carrier 
recovery loop would be more challenging since it would operate on the spread signal. In particular, any nonlinear operation, such as 
squaring, that is used to remove either the data or the spreading sequence in carrier phase recovery would be seriously degraded by the 
noise associated with the spread signal. 
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the symbol time, and s; is the symbol transmitted over the It h symbol time. Linear modulation is used since DSSS 
is a form of phase modulation and therefore works best in conjunction with a linearly modulated data signal. The 
modulated signal is then multiplied by the spreading code s c (t) with chip time T c , and then upconverted through 
multiplication by the carrier cos(27r/ c f). The spread signal passes through the channel hit) which also introduces 
additive noise n(t) and narrowband interference I(t). 




Transmitter 



Receiver 



Figure 13.5: DSSS System Model 

Assume the channel introduces several multipath components: h(t) = ao6(t — to) + a.\5(t — t\) + . . .. The 
received signal is first downconverted to baseband. The synchronizer then uses the resulting baseband signal z(t ) 
to align the delay r of the receiver spreading code generator with one of the multipath component delays r,. The 
spreading code generator then outputs the spreading code s c (t — r), where r = r, if the synchronizer is perfectly 
aligned with the delay associated with the ith multipath component. Ideally the synchronizer would lock to the 
multipath component with the largest amplitude. However, in practice this requires a complex search procedure, so 
instead the synchronizer typically locks to the first component it finds with an amplitude above a given threshold. 
This synchronization procedure can be quite complex, especially for channels with severe ISI or interference, and 
synchronization circuitry can make up a large paid of any spread spectrum receiver. Synchronization is discussed 
in more detail in Section 13.2.3. 

The multipath component at delay r is despread by multiplying it with the spreading code s c (t — r). The 
other multipath components arc not despread, and most of their energy is removed, as we shortly show. After 
despreading, the baseband signal x(t) passes through a matched filter and decision device. Thus, there are three 
stages in the receiver demodulation for direct sequence spread spectrum: downconversion, despreading, and base- 
band demodulation. This demodulator is also called the single-user matched-filter detector for DSSS. We now 
examine the three stages of this detector in more detail. 

For simplicity, assume rectangular pulses arc used in the modulation ( g(t ) = \J 2 /T s . 0 < t < T s ). The 
matched-filter g*(—t) then simply multiplies x(t) by ^/2 /T s and integrates from zero to T s to obtain the estimate 
of the transmitted symbol. Since coherent modulation is assumed, we neglect any carrier phase offset in the 
transmitter or receiver. We also assume perfect synchronization in the receiver. The multipath and interference 
rejection occurs in the data demodulation process. Specifically, the input to the matched filter is given by 

x(t) = [x(t)s c (t) cos(27r/ c f) * h(t)]s c (t — t) cos( 27 t f c t) + n(t)s c (t — r) cos(27r f c t) + I(t)s c (t — r) cos(27r f c t). 

(13.7) 
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Without multipath, h(t) = 5(t) and the receiver ideally synchronizes with r = 0. Then the spreading/despreading 
process has no impact on the baseband signal x(t). Specifically, the spreading code has amplitude ±1, so mul- 
tiplying s c (t) by a synchronized copy of itself yields s 2 (t) = 1 for all t. Then, in the absence of multipath and 
interference, i.e. for h(t) = 6(t) and I(t) = 0, 

x(t) = x(t)s 2 (t) cos 2 (2ir f c t) + n(t)s c (t) cos(27r/ c £) = x(t) cos 2 (27r/ c f) + n(t)s c (t) cos(27r f c t), (13.8) 

since s 2 (t) = 1. If s c (t) is sufficiently wideband then n(t)s c (t) has approximately the same statistics as n(t), i.e. 
it is a zero-mean AWGN random process with PSD No/2. The matched filter output over a symbol time will thus 
be 



si = 



x(t) * g*(—t)dt 



= — x(t) cos 2 (2ir f c t)dt + \ —n(t)s c (t) cos(2ir f c t)dt 



T. 



s Jo 



T s Jo 

Si + ni, 



= — Si cos (2nf c t)dt+ \ — / n{t)s c (t) cos(2tt f c t)dt 



S JO 



(13.9) 



where si and n; coiTespond to the data and noise output of a standard demodulator without spreading or despreading 
and the approximation assumes f c » 1/T S . 

We now consider the interference signal I (t) at the carrier frequency f c , which can be modeled as I (t) = 
I'(t) cos(2tt f c t) for some narrowband baseband signal I'{t). We again assume hit) = 6(t). Multiplication by the 
spreading signal perfectly synchronized to the incoming signal yields 

x(t) = x(t) cos 2 (27r f c t) + n(t)s c (t ) cos(27r f c t) + l'(t)s c (t ) cos 2 (27 t f c t) , (13.10) 

where n(t)s c (t) is assumed to be a zero-mean AWGN process. The demodulator output is then given by 

2 f Ts 2 f Ts 2 f Ta 

Si = — sis 2 c (t) cos 2 (2 -tt f c t) dt + — n(t)s c (t) cos{2tt f c t)dt + — l'(t)s c (t) cos 2 {2tt f c t)dt 

s Jo 1 S Jo J-s Jo 

~ Sl + Til + Ih (13.11) 



where si and rt/ coiTespond to the data and noise output of a standard demodulator without spreading or despreading 
and the approximation assumes f c » 1 /T s . The nainowband interference rejection can be seen from the last term 
of (13.11). In particulai - , the spread interference I'(t)s c (t) is a wideband signal with bandwidth of roughly 1 /T c , 
and the integration acts as a lowpass filter with bandwidth of roughly 1/T S « 1/T C , thereby removing most of 
the interference power. 

Let us now consider ISI rejection. Assume a multipath channel with one delayed component: h(t) = ao5(t) + 
aiS(t — ti). For simplicity, assume n = kT s is an integer multiple of the symbol time. Suppose that the 
first multipath component is stronger than the second: a o > on, and that the receiver synchronizes to the first 
component (r = 0 in Figure 13.5). Then, in the absence of narrowband interference (I(t) = 0), after despreading 
we have 



x(t) = ao x(t) cos(27r/ c i) + a\x(t — T\)s c (t — ri) cos(27r/ c (f — Ti))s c (t) cos(27r f c t) + n(t)s c (t) cos(27r f c t) . 

(13.12) 
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Since n = kT s , the ISI just coiTesponds to the signal transmission of the (l — /.; ) th symbol, i.e. x(t — ri) = 
x(t — kT s ) = si-kg{t — (l — k)T s ). The demodulator output over the /th symbol time is then given by 

2 f Ts 2 2 [ T * 

si = — a 0 si cos (2iTf c t)dt + — / ais;_ fc s c (f)s c (f - n) cos(2nf c t) cos(27r/ c (f - n))4i3.13) 

do do 

+ [ n(t)s c (t) cos(2tt f c t)dt 

4$ do 

^ a 0 s; + aiSz_fcCos(27r/ c Ti)p c (ri) + rq, (13.14) 



where, as in the case of interference rejection, si and ni correspond to the data symbol and noise output of a 
standard demodulator without spreading or despreading and the approximation assumes f c » 1/T S . The middle 
term at .sy_/,. cos( 27 t/ c ti)/ 9 c (ti) comes from the following integration: 



s c (t)s c (t - n) cos(27r/ c f) eos(27r/ c (f - n))dt 

s c {t)s c {t - Ti)(cos(27r/ c Ti) + eos(47r/ c f - 2nf c Ti))dt 

1 f Ts 

Ri cos(27r/ c ri)— / s c (t)s c (t - Tl)dt 
1 S do 

= cos(27r/ c n)p c (ri), (13.15) 

where the approximation is based on f c » T c _1 , i.e. the spreading code is relatively constant over one period of 
the carrier, and 





Pc(n) = — 



s c (t)s c (t - ri)dt 



(13.16) 



is the autocorrelation of the spreading code at delay n over a symbol time 3 . More generally, the spreading code 
autocorrelation at delay r over a period [0, T] is defined as 



Pc(t) 
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s c (t)s c (t 
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1 

Nt 
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(13.17) 



where Nj = T/T c is the number of chips over duration T and the second equality follows from the fact that s c (t) 
is constant over a chip time T c . It can be shown that p c (r) is a symmetric function with maximum value at r = 0. 
Moreover, if s c (t) is periodic with period T, then the autocorrelation depends only on the time difference of the 
spreading codes, i.e. 

s c (t ~ T 0 )s c (t - r\)dt = p c (r i - r 0 ). (13.18) 

From (13.15), if T = T s and p c {r) = S(r), the despreading process removes all ISI. Unfortunately, it is 
not possible to have finite-length spreading codes with autocorrelation equal to a delta function. Thus, there has 

3 Note that if n is not an integer multiple of a symbol time, then the middle term in (13.14) gets more complicated. In particular, 
assuming g(t) = yT /T s , if n = (k + k)T s , 0 < n < 1, then x{t — n) = y/2/T a si-k - 1 for 0 < t < kT s and x(t — n) = \/2/T s si-k 
for kT s < t < T a . Thus, the middle term of (13.14) becomes 

l r*T B ^ /■ Tg 

aiS(_fc_i cos(27r/ c n)— / s c (t)s c (t - n)dt + a 1 si- k cos(2nf c T 1 )— / s c (t)s c (t - n)dt, 

4 s Jo 4 s J k t s 

where each term is a function of the spreading code autocorrelation taken over a fraction of the symbol time. 
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been much work on designing spreading codes with autocorrelation over a symbol time that approximates a delta 
function. In the next section, we discuss spreading codes for ISI rejection, including maximal linear codes, which 
have excellent autocorrelation properties to minimize ISI effects. 

13.2.2 Spreading Codes for ISI Rejection: Random, Pseudorandom, and m-Sequences 

Spreading codes arc generated deterministically, often using a shift register with feedback logic to create a binary 
code sequence b of Is and Os. The binary sequence, also called a chip sequence, is used to amplitude modulate a 
square pulse train with pulses of duration T c , with amplitude 1 for a 1 bit and amplitude -1 for a 0 bit, as shown 
in Figure 13.6. The resulting spreading code s c (t) is a sine function in the frequency domain, corresponding to 
the Fourier transform of a square pulse. The shift register, consisting of n stages, has a cyclical output with a 
maximum period of 2” — 1. To avoid a spectral spike at DC or biasing the noise in despreading, the spreading 
code s c (t) should have no DC component, which requires that the bit sequence b have approximately the same 
number of Is and Os. It is also desirable for the number of consecutive Is or Os, called a run, to be small. Runs 
are undesirable since if there is a run of k consecutive Is or Os, the data signal over kT c is just multiplied by a 
constant, which reduces the bandwidth spreading (and its advantages) by roughly a factor of k. Ideally the chip 
values change roughly every chip time, which leads to maximal spreading. Based on (13.15), we require spreading 
codes with p c (r) ~ 5 ( t ) to minimize ISI effects. 



Example 13.2: Find the baseband bandwidth of a spreading code s c (t) with chip time T c = 1 //see. 

Solution: The spreading code s c (t) consists of a sequence of unit amplitude square pulses of duration T c modulated 
with ±1. The Fourier transform of a unit amplitude square pulse is S(f) = T c sine(/T c ), with a mainlobe of 
bandwidth 2/T c . Thus, the null-to-null baseband bandwidth, defined as the minimum frequency where S(f) = 0, 
is 1/T C . 



While DSSS chip sequences must be generated deterministically, properties of random sequences are useful 
to gain insight into deterministic sequence design. A random binary chip sequences consists of i.i.d. bit values with 
probability one half for a one or a zero. A random sequence of length N can thus be be generated, for example, 
by flipping a fair coin N times as setting the bit to a one for heads and a zero for tails. Random sequences with 
length N asymptotically large have a number of the properties desired in spreading codes [6]. In particular, such 
sequences will have an equal number of ones and zeros, called the balanced property of a code. Moreover, the run 
length in such sequences is generally short. In particular, for asymptotically large sequences, half of all runs are of 
length 1, a quarter are of length 2, and so forth, so that a fraction l/2 r of all runs are of length r for r finite. This 
distribution on run length is called the run length property of a code. Random sequences also have the property 
that if they are shifted by any nonzero number of elements, the resulting sequence will have half its elements the 
same as in the original sequence, and half its elements different from the original sequence. This is called the 
shift property of a code. Following Golomb [6], a deterministic sequence that has the balanced, run length, and 
shift properties as it grows asymptotically large is referred to as a pseudorandom sequence. Since these three 
properties are often the most important in system analysis, DSSS analysis is often done using random spreading 
sequences instead of deterministic spreading sequences due to their analytical tractability [12, Chapter 2.2]. 

Among all linear codes, spreading codes generated from maximal-length sequences, or m-sequences, have 
many desirable properties. Maximal-length sequences are a type of cyclic code (see Chapter 8.2.4). Thus, they 
are generated and characterized by a generator polynomial, and their properties can be derived using algebraic 
coding theory [2, Chapter 3.3] [12, Chapter 2.2]. These sequences have the maximum period N = 2 n — 1 that 
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Figure 13.6: Generation of Spreading Codes 



can be generated by a shift register of length n, so the sequence repeats every NT C seconds. Moreover, since the 
sequences are cyclic codes, any time shift of an m-sequence is itself an m-sequence. These sequences also have 
the property that the modulo-2 addition of an m-sequence and a time shift of itself results in a different m-sequence 
corresponding to a different time shift of the original sequence. This property is called the shift-and-add property 
of m-sequences. The m-sequences have roughly the same number of Is and Os over a period: 2 n_1 — 1 zeros and 
2 n ~ 1 ones. Thus, spreading codes generated from m-sequences, called maximal linear codes, have a very small 
DC component. Moreover, maximal-linear codes have approximately the same run-length property as random 
binary sequences, i.e. the number of runs of length r in an /(-length sequence is l/2 r for r < n and l/2 r_1 for 
r = n. Finally, the balanced and shift-and-add properties of m-scqucnccs can be used to show that m-scqucnccs 
have the same shift property as random binary sequences. Hence, since m-sequences have the balanced, run length, 
and shift properties of random sequences, they belong to the class of pseudorandom (PN) sequences [12, Chapter 
2 . 2 ], 

The autocorrelation p c (r) of a maximal linear spreading code taken over a full period T = NT C is given by 



Pc(t) 



1 - |T|(1 + 1/iV) |t| < T c 

-1/N C |t| > T c 



(13.19) 



for |t| < ( N — 1)7)., which is illustrated in Figure 13.7. Moreover, since the spreading code is periodic with 
period T = NT C , the autocorrelation is also periodic with the same period, as shown in Figure 13.8. Thus, if r 
is not within a chip time of kNT c for any integer k, p c {r) = = 2^-1 ■ making n sufficiently large, the 

impact of multipath at delays that are not within a chip time of kNT c can be mostly removed. For delays r within 
a chip time of kNT c , the attenuation is determined by the autocorrelation p c (r), which increases linearly as r 
approaches kNT c . The power spectrum of s c (t) is obtained by taking the Fourier transform of its autocorrelation 
Pc(t), yielding 
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(13.20) 



Since pAj) is periodic, P Sc {f) is discrete, with samples every ^ 

The periodic nature of the autocorrelation p c {t) complicates ISI rejection. In particular - , from (13.16), the 
demodulator associated with the data signal in a spread spectrum system attenuates the ISI by the autocorrelation 
Pc(t) taken over a symbol time T s . Thus, if the code is designed with N = T s /T c chips per symbol, the de- 
modulator computes the autocorrelation over the full period T s = NT C and p c (r) is as given in (13.19). Setting 
N = T s /T c is sometimes referred to as a short spreading code, since the autocorrelation repeats every symbol 
time, as shown in Figure 13.8 for T = T s . However, short codes exhibit significant ISI from multipath components 
delayed by approximately an integer multiple of a symbol time, in particular the first few symbols after the desired 
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Figure 13.7: Autocorrelation of Maximal Linear Code (TV = T s /T c ) 
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Figure 13.8: Autocorrelation has Period T = NT C . 



symbol. If the period of the code is extended so that TV >> T s /T c , then only multipath at very large delays are 
not fully attenuated, and these multipath components typically have a low power anyway due to path loss. Setting 
TV >> T s /T c is sometimes referred to as a long spreading code. The problem with long spreading codes is that 
the autocorrelation (13.17) performed by the demodulator is taken over a part ial period T = T s « NT C instead 
of the full period NT C . The autocorrelation of a maximal linear code over a partial period is no longer character- 
ized by (13.19), so multipath delayed by more than a chip time is no longer attenuated by —1/TV. Moreover, the 
partial period autocorrelation is quite difficult to characterize analytically, since it depends on the stalling point in 
the code over which the partial autocorrelation is taken. By averaging over all stalling points, it can be shown that 
the ISI attenuation associated with the partial autocorrelation is roughly equal to l/G for G the processing gain, 
where G ~ T c /T s , the number of chips per symbol [3, Chapter 9.2]. 

While maximal length codes have excellent properties for ISI rejection, they have a number of properties that 
make them highly suboptimal for exploiting the multiuser capabilities of spread spectrum. In particular, there arc 
only a small number of maximal length codes of a given length TV, so at most TV users can share the total system 
bandwidth for multiuser DSSS based on maximal-length codes. Moreover, maximal length codes generally have 
relatively poor cross-correlation properties, at least for some sets of codes. In particular, the normalized code 
cross-correlation can be as high as .37 [3, Chapter 9.2], Therefore, for spread spectrum systems with multiple 
users, codes such as Gold, Kasami, or Walsh codes are used instead of maximal length codes, due to their superior 
cross-correlation properties. However, these codes can be less effective at ISI rejection than maximal length codes. 
More details on these spreading codes will be given in Section 13.4.1. 



Example 13.3: Consider a spread spectrum system using maximal linear codes with period T = T s and TV = 100 
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chips per symbol. Assume the synchronizer has a delay offset of .5 T c relative to the LOS signal component to 
which it is synchronized. By how much is the power of this signal component reduced by this timing offset. 



Solution: For r = .5 T c and N = 100 the autocorrelation p c (r) given by (13.19) is 



t|( 1 + 1/A0 ,5T c (1 + 1/100) 

T c T c 



1 - .5(1.01) = .495. 



Since the signal component is multiplied by p c (r), its power is reduced by p 2 (r ) = .495 2 = .245 = —6.11 dB. 
This is a significant reduction in power, indicating the importance of accurate synchronization, which is discussed 
in the next section. 



13.2.3 Synchronization 

We now examine the operation of the synchronizer in Figure 13.5. We assume a separate carrier phase recovery 
loop, so that the carrier in the demodulator is coherent in phase with the received carrier. The synchronizer must 
align the timing of the spreading code generator in the receiver with the spreading code associated with one of 
the multipath components arriving over the channel. A very common method of synchronization uses a feedback 
control loop, as shown in Figure 13.9. The basic premise of the feedback loop is to adjust the delay r of the 
spreading code generator until the function w(t) reaches its peak value. At this point, under ideal conditions, the 
spreading code is synchronized to the input, as we now illustrate. 




Figure 13.9: Synchronization Loop for DSSS. 

Consider a channel with impulse response h(t) = S(t — To) that just introduces a delay To- Neglecting noise, 
the signal input to the synchronizer from Figure 13.5 is zit) = x(t — to )s c {t — to) cos 2 (27 t f c t). The feedback 
loop will achieve synchronization when t = To- We will first assume that x(t) is a binary-modulated signal that 
is constant over the code period, and that the spreading codes are maximal-length codes. We will then discuss 
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extensions to more general spreading codes and modulated signals. Assume the spreading codes have period 
T = NT C , so their autocorrelation over one period is given by (13. 19) and shown in Figure 13.7. Then 

1 f T .5si f T 

w(t ) = — J sis c (t - To)s c (t - t) cos 2 (2it f c t)dt « — J s c (t - To)s c (t - r)dt = .5 sipjj - r 0 ), (13.21) 

from (13. 18). Since p c {r — ro) reaches its maximum at r — tq = 0 and Sk = ±1, the feedback control loop will ad- 
just r such that |u>(r)| increases. In particular, suppose \ t — ro| > T c . Then from (13.19), p c {r — ro) = —1/N and 
the synchronizer is operating outside the triangular region of the autocorrelation function shown in Figuree 13.7. 
The feedback control loop will therefore adjust r, typically in increments of T c , until \w(t) increases above 
— 1/N. This increase occurs when r is adjusted sufficiently such that |r — ro| < T c . At this point the synchronizer 
is within a chip time of perfect synchronization, which is sometimes referred to as coarse synchronization or 
acquisition. In general the channel has many multipath components, in which case coarse synchronization will 
synchronize to the first multipath component it finds above a given power threshold. 

An alternative to the feedback control loop for acquisition is a parallel-search acquisition system. This system 
has multiple branches that correlate the received signal against a delayed version of the spreading code, where each 
branch has a different delay equal to an integer multiple of the chip time. The sychronization locks to the branch 
with the maximum correlator output. A similar structure is used in a RAKE receiver, discussed in the next section, 
to coherently combine multipath components at different delays. For both synchronization methods, the coarse 
acquisition often uses short codes with a small period T to reduce acquisition time. If long codes arc used, the 
acquisition time can be shortened by performing the integration in the feedback loop over a fraction of the entire 
code period. In this case, as long as the partial autocorrelation is small for delays bigger than a chip time and above 
a given threshold for delays within a chip time, the acquisition loop can compare the partial autocorrelation against 
the threshold to determine if coarse acquisition has occured. For the fine tuning that follows coarse acquisition, long 
codes with integration over the full period are typically used to make the synchronization as precise as possible. 

Once coarse synchronization is achieved, the feedback control loop makes small adjustments to r to try to fine- 
tune its delay estimate such that r ~ ro- This is called fine synchronization or tracking. Suppose through course 
synchronization we obtain t — tq = T c . Refering to Figure 13.7, this implies that the synchronizer is operating on 
the far right edge of the triangular correlation function. As r is further decreased, r — ro decreases towards zero, 
and the synchronization “walks backwards” towards the peak of the autocorrelation at r — to = 0. Once the peak 
is attained, the synchronizer locks to the delay to- Due to the time- varying nature of the channel, interference, 
multipath, and noise, r must be adjusted continuously to optimize synchronization under these dynamic operating 
conditions. Spread spectrum tracking often uses the same timing recovery techniques discussed in Chapter 5.6.3 
for narrowband systems. 

The acquisition and tracking procedures for more general spreading codes are very similar. Since all periodic 
spreading codes have an autocorrelation that peaks at zero, the course and tine synchronization will adjust their 
estimate of the delay to try to maximize the autocorrelation output of the integrator. The synchronization perfor- 
mance is highly dependent on the shape of the autocorrelation function. A sharp autocorrelation facilities accurate 
tine tuning of the synchronization. Noise, fading, interference, and ISI will also complicate both coarse and tine 
synchronization, since the output of the integrator in Figure 13.9 will be distorted by these factors. 

When s(t) is not binary or constant over the code period, the integrator output will depend on the data sym- 
bol(s) over the duration of the integration. This is the same situation as in carrier and timing recovery of narrowband 
systems with unknown data, discussed in Chapter 5.6, and similar techniques can be applied in this setting. Note 
that we have also neglected carrier phase recovery in our analysis, assuming that the receiver has a carrier recovery 
loop to obtain a coherent phase reference on the received signal. Carrier recovery techniques were discussed in 
Chapter 5.6, but these techniques must be modifed for spread spectrum systems, since the spreading codes im- 
pact the carrier recovery process [13]. Acquisition and tracking is a very challenging aspect of a spread spectrum 



391 




system design, especially in time-varying wireless environments. Much work has been devoted to developing and 
analyzing spread spectrum synchronization techniques. Details on the main techniques and their performance can 
be found in [8, Chapter 12.5] [5, Chapter 6] [11, Paid 4,Chapters 1-2], [2, Chapters 4-5]. 



13.2.4 RAKE receivers 

The spread spectrum receiver shown in Figure 13.5 will synchronize to one of the multipath components in the 
received signal. The multipath component to which it is synchronized is typically the first one acquired during the 
coarse synchronization that is above a given threshold. This may not be the strongest multipath component, and 
also treats all other multipath components as interference. A more complicated receiver can have several branches, 
with each branch synchronized to a different multipath component. This receiver structure is called a RAKE 
receiver 4 and typically assumes there is a multipath component at each integer multiple of a chip time. Thus, 
the time delay of the spreading code between branches is T c , as shown in Figure 13.10. The RAKE is essentially 
another form of diversity combining, since the spreading code induces a path diversity on the transmitted signal so 
that independent multipath components separated by more than a chip time can be resolved. Any of the combining 
techniques discussed in Chapter 7 may be used. 




s c (t-JT c ) 



Figure 13.10: RAKE receiver 

In order to study the behavior of RAKE receivers, assume a channel model with impulse response hit) = 
o a j$(t — jT c ), where ctj is the gain associated with the jth multipath component. This model, described in 
Chapter 3.4, can approximate a wide range of multipath environments by matching the statistics of the complex 
gains to those of the desired environment. The statistics of the a/s have been characterized empirically in [9] 

4 The name RAKE comes from the notion that the multibranch receiver resembles a garden rake, and has the effect of raking up the 
energy associated with the multipath components on each of its branches. The RAKE was invented in the 1950s to deal with the ionospheric 
multipath on a spread spectrum HF transcontinental link. The name was coined by the RAKE inventors Paul Green and Bob Price. 
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for outdoor wireless channels. With this model, each branch of the RAKE receiver in Figure 13.10 synchronizes 
to a different multipath component and coherently demodulates its associated signal. A larger J implies a higher 
receiver complexity but also increased diversity. Then, from (13.14) and (13.15), the output of the ith branch 
demodulator is 

J - 1 

S \ = otiSi + ^2 ajPc(iT c - jT c )si + rij, (13.22) 

3 = i 

*7 

where si is the symbol transmitted over symbol time [IT S , ( l + 1 )T S ], i.e. the symbol associated with the LOS path, 
and we assume si = s;_i, so si is also transmitted over [IT S — jT c , IT S \. If si / i then the ISI term in (13.22)is 
more complicated and involves partial autocorrelations. However, in all cases the ISI is reduced by roughly the 
autocorrelation p c ((i — j)T c ). The diversity combiner coherently combines the demodulator outputs. In particular, 
with SC the branch output sj with the largest path gain a, is output from the combiner, with EGC all demodulator 
outputs arc combined with equal weighting, and with MRC the demodulator outputs arc combined with a weight 
equal to the branch SNR or SINR, if the ISI interference is taken into account. If p c (r) ~ 0 for |r| > T c then we 
can neglect the ISI terms in each branch, and the performance of the RAKE receiver with J branches is identical 
to any other J-branch diversity technique. A comprehensive study of RAKE performance for empirically-derived 
channel models was done by Turin in [9]. 

Spread spectrum is not usually used for diversity alone, since it requires significantly more bandwidth than 
other diversity techniques. However, if spread spectrum signaling is chosen for its other benefits, such as its mul- 
tiuser or interference rejection capabilities, then RAKEs provide a simple mechanism to obtain diversity benefits. 

13.3 Frequency-Hopping Spread Spectrum (FHSS) 

An end-to-end frequency-hopping spread spectrum system is illustrated in Figure 13.11. The spreading code is 
input to the frequency synthesizer to generate the hopping carrier signal c(t) = cos(27r/ ? f + 9i(t)), which is input 
to the modulator to upconvert the modulated signal to the carrier frequency. The modulator can be coherent, non- 
coherent, or differentially coherent, although coherent modulation is not as common as noncoherent modulation 
due to the difficulties in maintaining a coherent phase reference while hopping the carrier over a wide bandwidth 
[11, Part 2, Chapter 2]. At the receiver, a synchronizer is used to synchronize the locally generated spreading code 
to that of the incoming signal. Once synchronization is achieved, the spreading code is input to the frequency syn- 
thesizer to generate the hopping pattern of the carrier, which is then input to the demodulator for down conversion. 
For noncoherent or differentially coherent modulator, it is not necessary to synchronize the phase associated with 
the receive carrier to that of the transmit carrier. 

As with DSSS, the synchronization procedure for FH systems is typically done in two stages. First, a coarse 
synchronization is done to align the receiver hop sequence to within a fraction of the hop duration T c associated 
with the transmitted FH signal. The process is similar to the coarse synchronization of DSSS: the received FH 
signal plus noise is correlated with the local hopping sequence by multiplying the signals together and computing 
the energy in their product. If this energy exceeds a given threshold, coarse acquisition is obtained, otherwise 
the received FH signal is shifted in time by T c and the process repeated. Coarse acquisition can also be done 
in parallel using multiple hop sequences, each shifted in time by a different integer multiple of T s . Once coarse 
acquisition is obtained, fine tuning occurs by continually adjusting the timing of the frequency hopper to maximize 
the correlation between the receiver hopping sequence and the received signal. More details on FH synchronization 
and an analysis of system performance under synchronization errors can be found in [1 1, Paid 4]. 

The impact of multipath on FH systems was discussed in Section 13.1, where we saw that a FH system 
does not exhibit fading if the multipath components have delay exceeding the hop time, since only one non- 
fading signal component arrives during each hop. When multipath does cause flat or frequency-selective fading. 
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Figure 13.11: FHSS System Model 



the performance analysis is the same as for a slowly time-varying non-hopping system. However, the impact of 
narrowband interference on FH systems, as characterized by the probability of symbol error, is more difficult to 
determine. In fact, this error probability depends on the exact structure of the interfering signal and how it impacts 
the specific modulation in use, as we now describe. 

We will focus on symbol error probability for a SFH system without coding, where the interference, if present, 
is constant over a symbol time. The analysis for FFH is more complicated, since interference changes over a symbol 
time, making it more difficult to characterize its statistics and the resulting impact on the symbol error probability. 
Assume a SFH system with M out of the N frequency bands occupied by a narrowband interferer. Assuming the 
signal hops uniformly over the entire frequency band, the probability of any given hop being in the same band 
as an interferer is then M/N. The probability of symbol error is obtained by conditioning on the presence of an 
interferer over the given symbol period: 



P s = ^(symbol erroijno interference)p(no interference) + ^(symbol citoi'| interference )p( interference) 

N - M M 

= — — — ^(symbol erroijno interference) + — ^(symbol erroijinterference). (13.23) 



In the absence of interference the probability of symbol error just equals that of the modulated data signal trans- 
mitted over an AWGN channel with received SNR j s , which we will denote as P^ WGN . Note that j s is the 
received SNR at the input to the demodulator in the absence of interference, so multipath components removed 
in the despreading process do not affect this SNR. However, will be affected by the channel gain at the carrier 
frequency for the multipath components that are not removed by despreading. For most coherent modulations, 
pAWCN jy aM q (^J y s ) for oiM and /3m dependent on the modulation, as discussed in Chapter 6.1.6. The 
pM\ gn f or nonC oherent or differentially coherent modulations in AWGN arc generally more complex [10, Chap- 
ter 1.1], Given pAWGN^ q remains only to characterize the probability of error when interference is present, 
^(symbol error | interference) in order to determine P s in (13.23). If we denote this probability as Pg NT , then 
(13.23) becomes 



Ps = 



N - M pAWGN 

N 



M pi NT 

N s 



(13.24) 
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Let us now examine P / A ' 1 more closely. This symbol error probability will depend on the exact characteristics 
of the interference signal. Consider first a narrowband interferer with the same statistics as AWGN within the 
bandwidth of the modulated signal. An interferer with these characteristics is sometimes referred to as a partial 
band noise jammer. For this type of interferer, Pg NT is obtained by treating the interference as an additional 
AWGN component with power Nj within the bandwidth of the modulated signal. The total noise power is then 
NqB + Nj, the effective SNR in the presence of this interference becomes 

I NT N 0 B 

7 * = 7 * a ^+ w 

which yields 

pI N T = pAWGN {l IN T) (13.25) 

Suppose now that the interference consists of a tone at the hopped carrier frequency with some offset phase. Then 
the demodulator output si in Figure 13.11 is given by 

si = aisi + ni + Ii, (13.26) 

where a/ is the channel gain associated with the received signal after despreading, n / is the AWGN sample, and 
h = V~Ie^ is the interference term with phase offset (f>i . Note that since this is a wideband channel, fading is 
frequency-selective, so the channel gain ai will depend on the carrier frequency, and some hops may be associated 
with very poor chanel gains. The impact of the additional interference term / / will depend on the modulation. For 
example, with coherent MPSK, assuming Zsi = 0, 

P s = 1 ~p{\^(aist + ni + Ii ) | < 7 t/M). (13.27) 

In general, computing P s for either coherent or noncoherent modulation requires finding the pdf of the phase 
Z(?r; + /;). This pdf and the resulting P s is derived in [1 1, Parts 2-3] for noncoherent, coherent, and differentially 
coherent modulations and a number of different interference models. Coding or coding with interleaving is often 
used in FH systems to compensate for frequency-selective fading as well as narrowband interference or jamming. 
Analysis of coded systems with interference can be found in [1 1, Paid 2, Chapter 2]. 

13.4 Multiuser DSSS Systems 

Spread spectrum can also be used as a mechanism for many users to share the same spectrum. Using spreading 
code properties to support multiple users within the same spread bandwidth is also called spread-spectrum multiple 
access (SSMA), which is a special case of code-division multiple access (CDMA). In multiuser spread spectrum, 
each user is assigned a unique spreading code or hopping pattern, which is used to modulate their data signal. The 
transmitted signal for all users are superimposed in time and in frequency. The spreading codes or hopping patterns 
can be orthogonal, in which case users do not interfere with each other under ideal propagation conditions, or they 
can be non-orthogonal, in which case there is interference between users, but this interference is reduced by the 
spreading code properties. Thus, while spread spectrum for single-user systems is spectrally inefficient, as it uses 
more bandwidth than the minimum needed to convey the information signal, spread spectrum multiuser systems 
can support an equal or larger number of users in a given bandwidth than other forms of spectral sharing such as 
time-division or frequency-division. However, if the spreading mechanisms are non-orthogonal either by design 
or through channel distortion, users interferer with each other. If there is too much interference between users, the 
performance of all users degrades. Comparison of the spectral efficiency for different spectral sharing methods in 
multiuser and cellular systems will be discussed in Chapters 14-15. 
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Performance of multiuser spread spectrum also depends on whether the multiuser system is a downlink 
channel (one transmitter to many receivers) or an uplink channel (many transmitters to one receiver). These 
channel models arc illustrated in Figure 13.12: the downlink channel is also called a broadcast channel or forward 
link, and the uplink channel is also called a multiple access channel or reverse link. The performance differences of 
DSSS in uplink and downlink channels result from the fact that in the downlink, all transmitted signals arc typically 
synchronous, since they originate from the same transmitter. Moreover, both the desired signal and interference 
signals pass through the same channel before reaching the desired receiver. In contrast, users in the uplink channel 
are typically asynchronous, since they originate from transmitters at different locations, and the transmitted signals 
of the users travel through different channels before reaching the receiver. In this section we will analyze the 
multiuser properties of DSSS for both downlinks and uplinks. In Section 13.5 we treat multiuser FHSS systems. 




Downlink Channel 



Uplink Channel 



Figure 13.12: Downlink and Uplink Channels. 



13.4.1 Spreading Codes for Multiuser DSSS 



Multiuser DSSS is accomplished by assigning each user a unique spreading code sequence s Ci {t). As described 
in Section 13.2.2, the autocorrelation function of the spreading code determines its multipath rejection properties. 
The cross-correlation properties of different spreading codes determines the amount of interference between users 
modulated with these codes. For asynchronous users, their signals arrive at the receiver with arbitrary relative 
delay r, and the cross-correlation between the codes assigned to user i and user j over one symbol time with this 
delay is given by 



Pij(T ) 



1 f' s 

/ s Ci (t)s c At - r)dt 



^ Sci(nT c )s Cj (nT c - r). 

n= 1 



(13.28) 



For synchronous users, their signals arrive at the receiver aligned in time, so r = 0 and the cross-correlation 
becomes 



Pij{0) 




s Ci {t)s C j{t)dt 



^5 2 s ci(nT c )s Cj (nT c ). 
n = 1 



(13.29) 



Ideally, since interference between users is dictated by the cross-correlation of the spreading code, we would 
like Pij(r) = 0 Vr, i / j for asynchronous users and pij(0) = 0,t / j for synchronous users to eliminate 
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interference between users. A set of spreading codes for asynchronous users with pij(r) = 0 Vr, i / j or for 
synchronous users with p%j{r = 0) = 0. i ^ j is called an orthogonal code set. A set of spreading codes that 
does not satisfy this cross-correlation property is called a non-orthogonal code set. It is not possible to obtain 
orthogonal codes for asynchronous users, and for synchronous users there is only a finite number of spreading 
codes that arc orthogonal within any given bandwidth. Thus, an orthogonality requirement restricts the number 
of different spreading codes (and the corresponding number of users) in a synchronous DSSS multiuser system. 
We now describe the most common chip sequences and their associated spreading codes that are used in multiuser 
DSSS systems. 



Gold Codes 



Gold codes have worse autocorrelation properties than maximal-length codes, but better cross-correlation proper- 
ties if properly designed. The chip sequences associated with a Gold code are produced by the binary addition of 
two m-sequences each of length 2 n — 1, and they inherit the balanced, run length, and shift properties of these 
component codes, hence are pseudorandom sequences. Gold codes take advantage of the fact that if two distinct 
m-sequences with time shifts n and T 2 are modulo-2 added together, the resulting sequence is unique for every 
unique value of n or t^. Thus, a very large number of unique Gold codes can be generated, which allows for a large 
number of users in a multiuser system. However, if the m-sequences that arc modulo-2 added to produce a Gold 
code arc chosen at random, the cross-correlation of the resulting code may be quite poor. Thus, Gold codes arc 
generated by the chip sequences associated with the modulo-2 addition of preferred pairs of m-sequences. These 
preferred pairs are chosen to obtain good cross-correlation in the resulting Gold code. However, the prefered pairs 
of m-sequences have different autocorrelation properties than general m-sequences. A method for choosing the 
preferred pairs such that the cross-correlation and autocorrelation functions of the resulting Gold code are bounded 
was given by Gold in [7], and can also be found in [14] [5, Appendix 7] [3, Chapter 9.2]. The preferred sequences 
are chosen so that Gold codes have a three-valued cross-correlation with values 



f -W 

Pair) = S —t(n)/N , 

{ £[*(")- 2 ] 

where 

J 2(" +1 )/ 2 + 1 n odd 

t( n ) - | 2 (n+2)/2 + 1 neyen • 

The autocorrelation takes on the same three values. 



(13.30) 



(13.31) 



Kasami Codes 

Kasami chip sequences have similar properties as the preferred sequences used to generate Gold codes, and are 
also derived from m-sequences. However, the Kasami codes have better cross-correlation properties than Gold 
codes. There are two different sets of Kasami chip sequences that are used to generate Kasami codes, the large 
set and the small set. To generate the small set, we begin with an m-sequence a of length 2" — 1 for n even and 
form a new shorter sequence o' by sampling every 2”/ 2 + 1 elements of a. The resulting sequence a' will have 
period 2 n / 2 — 1. We then generate a small set of Kasami sequences by taking the modulo-2 sum of a with all cyclic 
shifts of the a' sequence. There are 2"/ 2 — 2 such cyclic shifts, and by also including the original sequence a in 
the set, we obtain a set of 2 n / 2 binary sequences of length 2 n — 1. As with the Gold codes, the autocorrelation and 
cross-correlation of the Kasami spreading codes obtained from the Kasami chip sequences are three-valued, taking 
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on the values 



r -w 

Pij(r) = { —s(n)/N , (13.32) 

l £[«(") -2] 

where s(n) = 2 n,, ~ + 1. Since |s(n)| < |t(n)|, Kasami codes have better autocorrelation and cross-correlation than 
Gold codes. In fact, the Kasami codes achieve the Welch lower bound for the autocorrelation and cross-correlation 
for any set of 2 n / 2 sequences of length 2 n — 1, and hence are optimal in terms of minimizing the autocorrelation 
and cross-correlation for any such code [14] [1 1 , Paid 1, Chapter 5], 

The large set of Kasami sequences is formed in a similar way as the small set. It has a larger number of 
sequences than the smaller set, and hence can support more users in a multiuser system, but the autocorrelation 
and cross-correlation properties across the spreading codes generated from this larger set are inferior to those 
generated from the smaller set. To obtain the large set, we take an m-sequence a of length N = 2 n — 1 for n 
even and form two new sequences a' and a" by sampling the original sequence every 2 n / 2 + 1 elements for a ’ and 
every 2( n+2 )/ 2 + 1 elements for a". The set is then comprised by adding a, a', and a" for all cyclic shifts of a' 
and a" . The number of such sequences is 2 3n / 2 if n is a multiple of 4 and 2 3n / 2 + 2 n / 2 if mod 4 (n) = 2. The 
autocorrelation and cross-correlation of the spreading codes generated from this set can take on one of five values: 

r w 

p(r) = { ^r(— 1 ± 2 n / 2 ) . (13.33) 

l ^(— 1 ± ( 2 n / 2 + 1 ) 

Since these values exceed those for codes generated from the small Kasami set, we see that the Kasami codes gen- 
erated from the large Kasami set have inferior cross-correlation and autocorrelation properties to those generated 
from the small Kasami set. 



Example 13.4: Find the number of sequences and the magnitude of the worst-case cross-correlation for small and 
large Kasami sequences with n = 10. 



Solution: For the small set, there are 2 n / 2 = 
correlation is 



1 ' 
N . 



2 n/ 2 + 



2 5 = 32 sequences. From (13.32), the largest magnitude cross- 

l] = 5KTTT I 2 * + »] = » 32 - 



For the large set, mod^lO) = 2, so there are 2 3n / 2 + 2 n / 2 = 2 15 + 2 10 = 33, 792 sequences, 3 orders of magnitude 
more codes than in the small set. The largest magnitude cross-correlation is 



1 - 
N . 



2 n / 2 + 2 



^lo-TT [2 5 + 2] = -0 33 ' 



So there is a slightly larger cross-correlation, the price paid for the significant increase in the number of codes. 



Walsh-Hadamard Codes 

Walsh-Hadamard codes of length N = T s /T c that arc synchronized in time arc orthogonal over a symbol time, so 
that the cross-correlation of any two sequences is zero. Thus, synchronous users modulated with Walsh-Hadamard 
codes can be separated out at the receiver with no interference between them, as long as the channel does not 
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corrupt the orthogonality of the codes (Delayed multipath components arc not synchronous with the LOS paths, 
and thus the multipath components associated with different users will cause interference between users. The loss 
of orthogonality can be quantified by the orthogonality factor [15]). While it is possible to synchronize users on the 
downlink, where all signals originate from the same transmitter, it is more challenging to synchronize users in the 
uplink, since they are not co-located. Hence, Walsh-Hadamard codes arc rarely used for DSSS uplink channels. 
Walsh-Hadamard sequences of length N are obtained from the rows of an N x N Hadamard matrix H tv- For 
N = 2 the Hadamard matrix is 



Larger Hadamard matrices arc obtained using H 2 and the recursion 



H2JV — 



Htv Htv 
Htv -Htv 



Each row of Htv specifies the chip sequence associated with a different sequence, so the number of spreading 
codes in a Walsh-Hamadamard code is N. Thus, DSSS with Walsh-Hadamard sequences can support at most 
N = T s /T c users. Since DSSS uses roughly N times more bandwidth than required for the information signal, 
approximately the same number of users could be supported by dividing up the total system bandwidth into N 
nonoverlapping channels (frequency-division). Similarly, the same number of users can be supported by dividing 
time up into N orthogonal timeslots (time-division) where each user operates over the entire system bandwidth 
during his timeslot . Hence, any multiuser technique that assigns orthogonal channels to the users such that they 
do not interfere with each other accommodates approximately the same number of users. 

The performance of a DSSS multiuser system depends both on the spreading code properties as well as the 
channel over which the system operates. In the next section we will study performance of DSSS multiuser systems 
over downlinks. Performance over uplinks will be treated in Section 13.4.3 



13.4.2 Downlink Channels 

The transmitter for a DSSS downlink system is shown in Figure 13.13 and the channel and receiver in Figure 13.14. 
In the downlink the signals of all users arc typically sent simultaneously by the transmitter (base station), and each 
receiver must demodulate its individual signal. Thus we can assume that all signals are synchronous, which allows 
the use of orthogonal spreading codes such as the Walsh-Hadamard codes. However, the use of orthogonal codes 
limits the number of users the downlink can support, so such codes arc not always used. 

Consider a /wiser system, where the transmitter sends to K independent users. The baseband modulated 
signal associated with the kth user is 

Xk(t) = y^s ki g(t - IT S ), (13.34) 

1 

where g(t) = \J 2/T s is the pulse shape, assumed rectangular, T s the symbol time, and s k i is the kth user’s symbol 
over Zth symbol time. The transmitter consists of K branches, where the kth branch multiplies the kth user’s signal 
Xk(t) with the spreading code s Ck (t). The branches arc summed together, resulting in the baseband multiuser 
signal 

K K 

= J2x k (t)s Ck (t) = J2 

k = 1 k = 1 

This multiuser signal is multiplied by the carrier to obtain the passband signal s(t) which is transmitted over the 
channel. 



2 

7fT s kl s c k {t)- 
-L s 



(13.35) 
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Figure 13.13: Downlink Transmitter. 



The signal received by user k first passes through the kth user’s channel, which has impulse response /;./.(/:) 
and AWGN. Thus the received signal at the A:th user’s receiver is s(t) * hj~{t) + n(t). This signal is downconverted 
and then multiplied by the kth user’s spreading code s Ck (t), which is assumed to be perfectly synchronized to the 
Mi user’s spreading code in the received signal 5 . The signal is then baseband demodulated via a matched filter, 
i.e. it is multiplied by \J 2/T s and integrated over a symbol time. The demodulator output is sampled every T s to 
obtain an estimate of the symbol transmitted by the kth user over that symbol time. Comparing Figures 13.5 and 
13.14, we see that the kth user’s receiver is identical to the matched-filter detector in a single-user DSSS system. 
Thus, in the absence of multiuser interference, the kth user has identical performance as in a single-user DSSS 
system. However, when multiuser interference is taken into account, the demodulator output includes components 
associated with the kth user’s signal, interference terms from other users’ signals, and noise. In particular, the 

s This synchronization is even more difficult than in the single-user case, since it must be done in the presence of multiple spread signals. 
In fact some spreading code sets are obtained by shifting a single spreading code by some time period. For these systems there must be 
some control channel to inform the receiver which time shift corresponds to its desired signal. More details on the synchronization for these 
systems can be found in [5], 
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demodulator output associated with the Ath user over the Zth symbol time is given by 



= y — J [s(t) * hk(t) + n(t)\ s Ck (t) cos(2irf c t)dt 

= J o [*(*) * h k P (f)] s c k (f) cos 2 {2irf c t)dt + y^frj n i t ) s c k {t) cos(2vr f c t)dt 

2 f Ts K nr f Ts 

= y2sjis Cj {t) * h^ p {t) s Ck (t) cos 2 (2irf c t)dt + J— / n(t)s Ck (t) cos{2it f c t)dt 

J-S Jo . =1 V t s J 0 

2 f Ts 

= — / [skis Cfe (i) * h\l p (t)] s Ck {t) cos 2 (2-nf c t)dt + 

1 s Jo 



2 f Ts K nr f Ta 

Y J * h k P n ) s c fc (t) cos 2 (2tt f c t)dt +y-y n(t)s Ck {t ) cos(2tt f c t)dt, 

s ^ j=i s ^ 



(13.36) 



where h^ p (t) is the baseband equivalent lowpass filter for hk{t), Ski is the Ath user’s transmitted symbol over the 
Zth symbol period that is being recovered, and Sji is the transmitted symbol of the jth user over this symbol period, 
which causes interference. Note that (13.36) consists of three separate terms. The first term corresponds to the 
received signal of the A th user alone, the second term represents interference from other users in the system, and 
the last term is the AWGN sample, which we denote as n^. The first term and the noise sample are characterized 
by the analysis in Section 13.2 for single-user systems. The second term depends on both the channel h pp (t) and 
the spreading code properties, as we now show. 




Figure 13.14: Downlink Channel and Receiver. 

To examine the characteristics of the multiuser interference, let us first assume that the Ath user’s has gain a p- 
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but no delayed multipath components, i.e. h(k) = h pp (t) = a k 6(t). Then (13.36) becomes 

2 f Ts 2 f Ps , 

= — / a k s M sl k (t) cos 2 (2irf c t)dt+ — / ^ a k Sjis Cj (t)s Ck (t) cos 2 {2 tt f c t)dt + n k 

s J 0 <5 J 0 i 

j = 1 

I< 

~ afcSfct + X] s jlPjk(Q) + n k , (13.37) 

i=i 

where Pjfc(O) is the cross-correlation between s Ck (t) and s c .(i) for a timing offset of zero, since the users arc 
assumed to be synchronous. 6 We define 

K 

hi = ak ^ Sjipj k { 0) (13.38) 

3 = 1 

j¥=k 

as the multiuser interference to the Ath user at the demodulator output. We see from (13.37) that the Ath user’s 
symbol s k i is attenuated by the channel gain but not affected by the spreading and despreading, exactly as in the 
single-user case. The noise sample n/ is also the same as in a single-user nonspread system. The interference 
from other users is attenuated by the A:th user’s channel gain a k and the cross-correlation of the codes pj k ( 0). 
For orthogonal codes, e.g. Walsh Hadamard codes, pj k ( 0) = 0 so there is no interference between users. For 
non-orthgonal codes pj k { 0) depends on the specific codes assigned to users j and k, e.g. for Kasami codes pj k ( 0) 
can take on one of three possible values. Note that both the A th user’s signal and the interference arc attenuated 
by the same channel gain a k , since both signal and interference follow the same path from the transmitter to the 
receiver. As we will see in the next section, this is not the case for DSSS uplink systems. 

If the interference in a multiuser system has approximately Gaussian statistics then we can treat interference as 
an additional noise term and determine system performance based on the signal-to-noise-plus-interference power 
ratio (SINR) for each user. However, the Gaussian approximation is often inaccurate, even when the number of 
interferers is large [16]. Moreover, in fading the interference terms arc correlated, since they all experience the same 
fading a k . Thus, the interference can only be approximated as conditionally Gaussian, conditioned on the fading. 
The conditionally Gaussian approximation is most accurate when the number of interferers is large, since the sum 
of a large number of random variables converges to a Gaussian random variable by the CLT 7 . The SINR for the A:th 
user is defined as the ratio of power associated with the A th user’s signal over the average power associated with 
the multiuser interference and noise at the demodulator output. The kth user’s performance is then analyzed based 
on the BER in AWGN with SNR replaced by the SINR for this user. Moreover, if the interference power is much 
greater than the system noise power, then we can neglect the noise altogether and determine performance based on 
an AWGN channel analysis with SNR replaced by the signal-to-interference power ratio (SIR) for each user. The 
SIR for the kth user is defined as the ratio of power associated with the Ath user’s signal over the average power 
associated with the multiuser interference alone. Multiuser spread spectrum systems where noise can be neglected 
in the performance analysis arc called interference-limited, since noise is negligible relative to interference in the 
performance analysis. For both SINR and SIR, obtaining the average interference power depends on the specific 
spreading sequences and symbol transmissions of the interfering users, which can be highly complex to analyze. 
As an alternative, average interference power is often computed assuming random spreading sequences. With this 
assumption it can be shown that the SIR for a synchronous I\ user system with N chips per symbol is given by 

6 If the users were not synchronous, which is unusual in a BS, then the cross-correlation pjk{ 0) in (13.37) would be replaced by 
Pjkhjk) for Tjk the relative delay between the received signal from users j and k. This assumes Sji is constant over the integration, if not 
the interference term depends on the different symbol values over the integration. 

7 This is true even if the random variables are not i.i.d., as long as they decorrelate 
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[17, Chapter 2.3] 



(13.39) 



SIR = 



N 



K - 1 



G 

K- 1’ 



where G « iV is the processing gain of the system. Note that this matches the SIR expression (13.6) under arbitrary 
interference. If noise is taken into account, the SINR is obtained from (13.39) by adding in noise scaled by the 
energy per symbol E s : 

SINR= No T~ K—l ’ ( 13 - 4 °) 

E s "T G 

Now consider a more general channel hk(t) = Ylm= t a kmMt — Tkm )• The output of the demodulator will 
again consist of three terms: the first corresponding to the /, th user’s signal, the second corresponding to the 
interference from other users, and the last an AWGN noise sample, which is not affected by the channel. The 
signal component associated with the kth user is analyzed the same way as in Section 13.2 for multipath channels: 
the delayed signal components are attenuated by the autocorrelation of the kth user’s spreading code. The multiuser 
interference is more complicated than before. In particular, assuming the demodulator is synchronized to the LOS 
component of the kth user, the demodulator output corresponding to the multiuser interference is given by 



2 rT s K M 

4; = — / Y] Y OihrnSjni^Scj (t - T m ) cos(2vr/ c (f - T m ))s Ck (t) cos(2v Tf c t)dt 

ls j=1 m = 1 

j¥=k 

K M 

» EE (l — Im) COS(27T j c'T'iri) P jk{j~nri ) ; (13.41) 

7=1 m = 1 
j^k 



where s 3 (i-i m ) is the symbol associated with the jth user over the IT S — Tkm th symbol time. Comparing (13.38) 
and (13.41), we see that the multipath channel affects the multiuser interference in two ways. First there arc more 
interference terms: whereas there were K — 1 before, we now have ( K — 1 )M, so each interfering user contributes 
M interference terms, one for each multipath component. In addition, the cross-correlation of the codes is no 
longer taken at delay r = 0, even though the users arc synchronous. In other words, the multipath destroys the 
synchronicity of the channel. This is significant, since orthogonal codes like the Walsh-Hamadard codes typically 
only have zero cross-correlation at zero delay. So if a Walsh-Hadamard multiuser system operates in a multipath 
channel, the users will interfere. 



Example 13.5: Consider a DSSS downlink with bandwidth expansion N = B s /B = 100. Assume the system 
is interference-limited and there is no multipath on any user’s channel. How many users can the system support 
under BPSK modulation such that each user has a BER less than 10 -3 . 



Solution: For BPSK, Pi, = Q( and % = 6.79 dB yields Pi, = 10 3 . Since the system is interference- 

limited, we set the SIR equal to the SNR ji, = 6.79 dB and solve for K , the number of users: 



SIR = 



N 



K - 1 



100 



K — l 



10' 679 = 4.775. 



Solving for K yields K < 1 + 100/4.77 = 21.96. Since K must be an integer and we require Pb < 10~ 3 , 21.96 
must be rounded down to 21 users, although typically a designer would build the system to support 22 users with 
a slight BER penalty. 
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13.4.3 Uplink Channels 



We now consider DSSS for uplink channels. In multiuser DSSS the spreading code properties arc used to separate 
out the received signals from the different users. The main difference in using DSSS on the uplink versus the 
downlink is that in the downlink both the kth user’s signal and the interfering signals from other users pass through 
the same channel from the transmitter to the kth user’s receiver. In an uplink the signals received from each user at 
the receiver travel through different channels. This gives rise to the near-far effect, where users that arc close to 
the uplink receiver can cause a great deal of interference to user’s farther away, as discussed in more detail below. 

The transmitter and channel for each individual user in a K - user uplink is shown in Figure 13.15. The 
transmitters are typically not synchronized, since they arc not co-located. In general the asynchronous uplink is 
more complex to analyze than the synchronous uplink and has worse performance. We see from Figure 13.15 
that the kth user generates the baseband modulated signal x k (t). As in the downlink model we assume rectangular 
pulses for x k (t). The kth user multiplies its baseband signal x k (t) by its spreading code s Ck (t) and then upconverts 
to the carrier frequency to form the kth user’s transmitted signal .s/,.(t). Note that the carrier signals for each user 
have different phase offsets. This signal is sent over the kth user’s channel, which has impulse response h k (t). After 
transmission through their respective channels, all users’ signals arc summed at the receiver front end together with 
AWGN n(t). 




Figure 13.15: DSSS Uplink System. 



The uplink received signal is thus given by 



r(i) 



' K 

^2 ( x k{t)s Ck {t ) cos(2vr/ c t + <j) k )) * h k (t) 
.k= 1 



+ n(t). 



(13.42) 



The receiver consists of K branches corresponding to the K received signals, as shown in Figure 13.16. We assume 
the kth user’s channel introduces a delay of r k , and the impact of this delay on the local carrier phase is incorporated 
in the phase offset cp' k . For synchronous users 77 . = 0. The kth branch downconverts the signal to baseband and 
then multiplies the received signal by the kth user’s spreading code, synchronized to the delay of the kth user’s 
incoming signal. The despread signal is then passed through a matched filter and sampled to obtain an estimate 
of each user’s transmitted symbol over the Zth symbol time. Comparing Figures 13.5 and 13.16, we see that the 
kth branch of the uplink receiver is identical to the matched-filter detector in a single-user DSSS system. Thus, 
the uplink receiver consists of a bank of K single-user matched-filter detectors, and in the absence of multiuser 
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interference the /;:th user has identical performance as in a single-user system. With multiuser interference taken 
into account the demodulator output of the fcth receiver branch over the /th symbol time is given by 



s k = \ TFT 



s JO 



K 






3 = 1 



s Ck (t - Tfe) cos(2ir f c t + (f>' k ) cos(2tt f c t + (f/Adt + n k 



= — / [s k is Ck (t) * h k p {t)] s Ck (t - r k ) cos 2 (27r/ c f + ct>' k )dt + 

Jo 



yI 

1 s Jo 



I\ 



J2si jk s Cj (t) * h pp (t) 



3 = 1 

j¥=k 



s Ck (t - r k ) cos(2tt fct + 4>' k ) cos(2n f c t + 4>Mt + n k 



(13.43) 



where n k is the AWGN sample, h pp (t) is the baseband equivalent lowpass filter for hj(t),j = 1, .... A', and sij k 
is the symbol transmitted over the yth user’s channel at time \IT S — Tj + r k , ( l + 1 ) T s — r y + r k \, which we assume 
to be constant. If this symbol takes different values on this interval, i.e. it changes values at IT S , then the ISI term 
is more complicated and involves partial cross-correlations, but the ISI attenuation is roughly the same. Note that 
(13.43) consists of three separate terms. The first term corresponds to the received signal of the fcth user alone, 
and the last term is the AWGN sample: these two terms arc the same as for a single-user system. The second term 
represents interference from other users in the system, and the interference of the jth user to the /rth users, j / k 
depends on the jth user’s lowpass equivalent channel /? ^ p (t) and the spreading code properties, as we now show. 




Figure 13.16: Uplink Receiver. 

Assume that each user’s channel just introduces a gain aj and delay Tj, so h pp (t) = ajS(t — Tj). Then the 
demodulator output for the A th branch over the Zth symbol time becomes 

s k = a k s k i + I k i + ni, (13.44) 
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where the first and third terms are the same as for a single-user system with this channel, assuming the spreading 
code in the receiver is perfectly synchronized to the delay 77 Let us now consider the interference term 
Substituting h^ p {t) = ajS(t — tj ) into (13.43) yields 



hi — ~ 



J-s Jo 

rf 

I< 



I< 



^2 sijkS Cj (t ) * a jS(t- Tj) 



3=1 

j+k 



I< 



S Ck (t - T k ) cos(27 Tf c t + (j)' k ) COs(27T f c t + (jJMt 



^ ^ Ol j Slj k S C j (t 



~ Tj 



5= 1 
j¥=k 



s Ck {t - r fc )[ cos(A0 fc j) + cos(47r/ c f + <\> k + (f>'j)]dt 



~ V OLj cos(A (j) kj )si jk — / S c .(i - Tj)s Ck (t - T k )dt 
i=l ' lsJ v 

j^k 

— Oj cos( AL/,-j ) $ij k f>jk (Tj T k ) j 



(13.45) 



where A 4> k j = <j/ k — (j)'- and the approximation is based on f c » 1/T C , so the spreading sequence is relatively 
constant over a carrier period. We see from (13.45) that as with the downlink, multiuser interference in the uplink 
is attenuated by the cross-correlation of the spreading codes. Since the users arc typically asynchronous, tj / T k , 
so orthogonal codes that require synchronous reception, e.g. Walsh-Hadamard codes, are not typically used on 
the uplink. Another important aspect of the uplink is that the A’th user’s symbol and multiuser interference arc 
attenuated by different channel gains. In particular, the Ath user’s signal is attenuated by the gain a k , while the 
interference from the jth user is attenuated by ay. If ay >> ay. then even though the interference is reduced by 
the spreading code cross-correlation, it can still significantly degrade performance. 

We now consider interference-limited uplinks. Suppose initially that all users have the same received power. 
Then the average SINR for asynchronous users on this channel, assuming random spreading codes with N chips 
per symbol, random start times, and random carrier phases, is given by [18] 

SINR = K—l \ N 0 • ( 13 ‘ 46 ) 

3 AT “T 2 E s 

For interference-limited systems we neglect the noise term to get the SIR 



SIR = 



3 N 

(K~ 1) 



3 G 

(K^ry 



(13.47) 



where G « N is the processing gain of the system. The expressions (13.46) and (13.47) are refered to as the 
standard Gaussian approximations for SINR and SIR. Care must be used in applying these approximations to 
an arbitrary system, since the SIR and SINR for a given system is heavily dependent on the spreading code prop- 
erties, timing and carrier phase assumptions, and other characteristics of the system. Modifications to the standard 
Gaussian approximation have been made to improve its accuracy for practical systems, but these expressions arc 
typically more difficult to work with and don’t lead to much greater accuracy than the standard approximations [3, 
Chapter 9.6]. We can modify (13.47) to approximate the SIR associated with nonrandom spreading codes as 



STR = 3N ~ 3G 



(13.48) 
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where £ is a constant characterizing the code cross-correlation that depends on the spreading code properties and 
other system assumptions. Under the standard Gaussian assumption £ = 1, whereas for PN sequences, £ = 2 [39] 
or £ = 3 [20], depending on the system assumptions. 

Suppose now that all K — 1 interference terms have channel gain a » a>k- The SIR for the A th user then 
becomes 



SIR (A) 



a 2 k 2>N 
a 2 £(K - 1) 



a 2 k 3G _ 3 G 

a 2 i(K - 1) <<C £(/T- 1)’ 



(13.49) 



so the A th user in the uplink suffers an SIR penalty of ol\/ol 2 due to the different channel gains. This phenomenon 
is called the near-far effect, since users far from the uplink receiver will generally have much smaller channel gains 
to the receiver than the interferes. In fading the o: /.s arc random, which typically reduces the code cross-correlation 
and hence increases the average SIR. The effect of fading can be captured by adjusting £ in (13.48) to reflect the 
average cross-correlation under the fading model. The value of £ then depends on the spreading code properties, 
the system assumptions, and the fading characteristics [21]. 

For multipath channels, /i^ p (f) = Y/m= t a jm$(t — Tj m ). Substituting this into (13.45) yields multiuser 
interference on the Ath branch of 



M 

I kl ~ ^ ' tXj ) n CO S ( A 0j m ) S jj rn fj j £. ( Tj rn T k ) , (13.50) 

m— 1 

where we assume the Ath branch is synchronized to a channel delay of 77, A 4>jkm is the relative phase offset, and 
sij m is the symbol transmitted by the jth user over time \IT S — 77 + T :jm , (l + 1 )T S — 77 + Tj m \ , which is assumed 
constant. This interference also contributes to the near-far effect, since if any of the multipath components have a 
large gain relative to the Ath user’s signal, it will degrade SIR. 

A solution to the near-far effect in DSSS uplink systems is to use power control based on channel inversion, 
where the Ath user transmits signal power P/a \ so that his received signal power is P, regardless of his path 
loss. This will lead to an SIR given by (13.48) for each user. The disadvantage of this form of power control 
is that channel inversion can require very large transmit power in some fading channels (e.g. infinite power is 
required in Rayleigh fading). Moreover, channel inversion can cause significant interference to other systems or 
users operating on the same frequency. In particular, channel inversion can significantly increase the interference 
between cells in a cellular system. Despite these problems, channel inversion is used on the mobile-to-base station 
connection in the IS -95 cellular system standard. 



Example 13.6: Consider a DSSS uplink system with processing gain G = B s /B = 100. Assume the system is 
interference -limited and there is no multipath on any user’s channel. Suppose user A has a received power that 
is 6 dB less than the other users. Find the number of users that the system can support under BPSK modulation 
such that each user has a BER less than HU 3 . Make the computation both for random codes under the standard 
Gaussian assumption, £ = 1, and for PN codes with £ = 3. 

Solution: As in the previous example, we require SIR = 7 f, = 6.79 dB=4.775 for = 10 3 . We again set the SIR 
equal to the SNR 7 & = 4.775 and solve for K to find the maximum number of users that the system can support. 
Since this is an asynchronous system with a k /a 2 = .251 (-6 dB), we have 

STR = 'PN .251(300) = 75.3 

,,-fiK - 1 ) ((K-l) ((K- 1 ) 

Solving for K yields K < 1 + 75.3/4.77£ = 16.78 for £ = 1 and K < 6.26 for £ = 3, so the system can only 
support between 6 and 16 users, up to a factor of 3 less than in the prior example for the downlink, due to the asyn- 
chronicity of the uplink and the near-far effect. This example also illustrates the sensitivity of the system capacity 
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calculation to the assumptions about the spreading code properties as captured by £. Since the SIR is roughly 
proportional to 1/ (£K), the number of users the system can support for a given SIR is roughly proportional to l/£. 



13.4.4 Multiuser Detection 



Interference signals in SSMA need not be treated as noise. If the spreading code of the interference signal is 
known, then this knowledge can be used to mitigate the effects of the multiple access interference (MAI). In 
particular, if all users arc detected simultaneously, then interference between users can be subtracted out, which 
either improves performance or for a given performance allows more users to share the channel. Moreover, when 
all users arc detected simultaneously, the near-far effect can aid in detection, since users with strong channel gains 
arc more easily detected (for subsequent cancellation) than if all users had the same channel gains. A DSSS 
receiver that exploits the structure of the multiuser interference in signal detection is called a multiuser detector 
(MUD). MUDs arc not typically used on downlink channels for several reasons. First, downlink channels arc 
typically synchronous, so they can eliminate all interference by using orthogonal codes, as long as the channel 
doesn’t corrupt code orthogonality. Moreover, the kth user’s receiver in a downlink is typically limited in terms of 
power and/or complexity, which makes it difficult to add complex MUD functionality. Finally, the uplink receiver 
must detect the signals from all users anyway, so any receiver in the uplink is by definition a multiuser detector, 
albeit not necessarily a good one. By constrast, the kth user’s receiver in the downlink need only detect the signal 
associated with the kth user. For these reasons work on MUD has primarily focused on DSSS uplink systems, and 
that is the focus of this section. 

Multiuser detection was pioneered by Verdu in [22, 23], where the optimum joint detector for the DSSS 
asynchronous uplink channel was derived. This derivation assumes an AWGN channel with different channel 
gains for each user. The optimum detector for this channel chooses the symbol sequences associated with all K 
users that minimize the MSE between the received signal and the signal that would be generated by these symbol 
sequences. Because the channel is asynchronous, the entire received waveform must be processed for optimal 
detection over any one symbol period. The reason is that symbols from other users arc not aligned in time, hence 
all symbols that overlap in the given interval of interest must be considered, and by applying to same reasoning to 
the overlapping symbols, we see that it is not possible to process the signal over any finite interval and still preserve 
optimality. The optimal MUD for the asynchronous case was shown in [23] to consist of a bank of K single-user 
matched-filter detectors, followed by a Viterbi sequence detection algorithm to jointly detect all users. The Viterbi 
algorithm has 2 /v 1 states and complexity that grows as 2 h assuming binary modulation. 

For synchronous users the optimal detection becomes simpler, since only one symbol interval needs to be 
considered in the optimal joint detection, so sequence detection is not needed. Consider a two-user synchronous 
uplink with gain q/ ; . on channel k and binary modulation. The complex equivalent lowpass received signal over 
one bit time is 

r(t) = a\bis ci (t) + a 2 b 2 S C2 (t) + n(t), (13.51) 

where bj. is the bit transmitted by the kth user over the given bit time. The optimum (maximum-likelihood) detector 
outputs the pair b* = (b\, b\ i) that satisfies 



arg min 
61,62 



-1 

2 ^ 




[r(t) - aibis ci (t) 



a 2 b2S C2 {t)\ 2 dt , 



(13.52) 



where a 2 is the noise power. This is equivalent to finding (6], b \ ;) to maximize the cost function 



L(b\, 62) = a\b\ri + «2&2?'2 — aia2^i&2Pi2, 



(13.53) 
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where 



rn 



Tk = 



r{t)s Ck {t)dt 



and 



rT b 



Pjk ~ 



s Ck (t)s Cj (t)dt. 



This analysis easily extends to K synchronous users. In this case we can express r 
form as [25] 

r = RAb + n, 



(13.54) 

(13.55) 

(n, . . . , tk) 7 in matrix 

(13.56) 



where b = (&i, . . . , bx) 1 is the bit vector associated with the K users over the given bit time, A is a diagonal 
K x K matrix of the channel gains a:/-, and R is a K x K matrix of the cross-correlations between the spreading 
codes. The optimal choice of bit sequence b* is obtained by chosing the sequence to maximize the cost function 



L(b) = 2b T Ar - b T ARAb. 



(13.57) 



Unfortunately maximizing (13.57) for K users also has complexity that grows as 2 K , the same as in the asyn- 
chronous case, assuming a search tree is used for the optimization. In addition to the high complexity of the 
optimal detector, it has the drawback of requiring knowledge of the channel amplitudes cr*.. 

The complexity of MUD can be decreased at the expense of optimality. Many suboptimal MUDs have been 
developed with various tradeoffs with respect to performance, complexity, and requirements regarding channel 
knowledge. Suboptimal MUDs fall into two broad categories: linear and nonlinear. Linear MUDs apply a linear 
operator or filter to the output of the matched filter bank in Figure 13.16. These linear detectors have complexity 
that is linear in the number of users, a significant complexity improvement over the optimal detector. The most 
common linear MUDs are the decorrelating detector [24] and the MMSE detector. The decorrelating detector 
simply inverts the matrix R of cross-correlations, resulting in 

b* = R _1 r = R 1 [RAb + n] = Ab + R -1 n. (13.58) 



The inverse exists for most cases of interest. In the absence of noise, the resulting bit sequence equals the original 
sequence, scaled by the channel gains. In addition to its simplicity, this detector has other appealing features: it 
completely removes MAI, and it does not require knowledge of the channel gains. However, the decorrelating 
detector can lead to noise enhancement, since the noise vector gets multiplied by the matrix inverse. Thus, decor- 
relating MUD is somewhat analogous to zero-forcing equalization described in Chapter 11.4.1: all MAI can be 
removed, but at the expense of noise enhancement. 

The MMSE detector finds the matrix D such that multiplication of the filter bank output by D minimizes the 
expected MSE between D and the transmitted bit sequence b. In other words, the matrix D satisfies 

argmin.E[(b — Dr) T (b — Dr)]. (13.59) 



The optimizing D is given by [17, 25, 26] 

D = (R + ,5iVoI) -1 . (13.60) 

Note that in the absence of noise, the MMSE detector is the same as the decorrelating detector. However, it has 
better performance at low SNRs, since it balances removal of the MAI with noise enhancement. This is analogous 
to the MMSE equalizer design for ISI channels, described in Chapter 1 1.4.2. 

Nonlinear MUDs have somewhat larger complexity than the linear detectors but also much better perfor- 
mance, although not necessarily in all cases, especially with little or no coding [39]. The most common nonlinear 
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MUD techniques arc multistage detection, decision-feedback detection, and successive interference cancellation. 
In a multistage detector, each stage consists of the conventional matched-filter bank. The nth stage of the detector 
uses decisions of the (n — l)st stage to cancel the MAI at its input. The multistage detector can be applied to either 
synchronous [28] or asynchronous [27] systems. The decision-feedback detector is based on the same premise 
as a decision-feedback equalizer. It consists of a feedforward and feedback filter, where the feedforward filter is 
the Cholesky factorization of the correlation matrix R. The decision feedback MUD can be designed for either 
synchronous [29] or asynchronous [30] systems. These detectors require knowledge of the channel gains and can 
also suffer from error propagation when decision errors arc fed back through the feedback filter. In interference 
cancellation an estimate of one or more users is made and the MAI caused to other users is subtracted off [19]. 
Interference cancellation can be done in parallel, where all users are detected simultaneously and then cancelled 
out [32, 33], or sequentially, where users are detected one at a time and then subtracted out from users yet to be 
detected [34]. Parallel cancellation has a lower latency and is more robust to decision errors. However, its perfor- 
mance suffers due to the near-far effect, when some users have much weaker received powers than others. Under 
this unequal power scenario, successive interference cancellation can outperform parallel cancellation [25]. In fact, 
successive cancellation achieves Shannon capacity of the uplink channel, as will be discussed in Chapter 14, and 
can approach Shannon capacity in practice [35]. Successive interference cancellation suffers from error propa- 
gation, which can significantly degrade performance, but this degradation can be partially offset through power 
control [36]. 

A comprehensive treatment of different MUDs and their performance can be found in [17], and shorter tu- 
torials are provided in [25, 19]. Combined equalization and MUD is treated in [37]. MUD for multirate CDMA, 
where different users have different data rates, is analyzed in [38]. Blind, space-time, and turbo multiuser detectors 
are developed in [40]. Spectral efficiencies of the different detectors have been analyzed in [39]. 

13.4.5 Multicarrier CDMA 

Multicarrier CDMA (MC-CDMA) is a technique that combines the advantages of OFDM and CDMA. It is very 
effective at both combating ISI and as a mechanism to allow multiple users to share the same channel. The basic 
block diagram for a baseband single-user multicarrier CDMA system is shown in Figure 13.17. The data symbol 
si is sent over all N subchannels. On the /th subchannel si is multiplied by the zth chip c % of a spreading sequence 
s c (t), where c t = ±1. This is similar to the standard spread spectrum technique, except that multiplication with 
the spreading sequence is done in the frequency domain rather than in the time domain. The frequency spread 
data (s;Ci , s;C 2 , . . . , -s/c ; .y) is then multicarrier modulated in the standard manner: the parallel sequence is passed 
through an IFFT, parallel to serial converter, and D/A converted to produce the modulated signal s(t), where S( f ) 
is as shown in Figure 13.17 for subchannel carrier frequencies (/ 1 , . . . , /'y ) . 

Assume the MC-CDMA signal is transmitted through a frequency-selective channel with a constant channel 
gain of a:, on the zth subchannel and AWGN n(t). The receiver performs the reverse operations of the transmitter, 
passing the received signal through an A/D converter, a serial-to-parallel converter, and an FFT to recover the 
symbol transmitted along the zth subchannel. The subchannel symbol received on the zth subchannel is multiplied 
by the zth chip c; and a weighting factor f3i, then these terms are summed together for the final symbol estimate hj. 

In a multiuser MC-CDMA system, each user modulates his signal as in Figure 13.17, but using a different 
spreading code s Ck (t ) . So for a two-user system, user 1 would use the spreading code s Cl (/) with chips (cj . . . . , cjy) 
resulting in a transmitted signal s i (/) and user 2 would use the spreading code s C2 (t) with chips ( c \ . . . . , c 2 N ) 
resulting in a transmitted signal S 2 {t). If the users transmit simultaneously, their signals are added “in the air” 
as shown in Figure 13.18, where sj is the symbol corresponding to user 1 over the /th symbol time and sf is the 
symbol corresponding to user 2 over this symbol time. The interference between users in this system is reduced by 
the cross-correlation of the spreading codes, as in standard spread spectrum without multicarrier. However, each 
user benefits from the frequency-diversity of spreading its signal over independently fading subchannels. This 



410 





MC-CDMA Modulator 







1 




• •• 


1 


fl 


1 

^ 2 


f 3 f N 



s,c 



2 



S(f) 




MC-CDMA Demodulator 



Figure 13.17: Multicarrier CDMA System. 



typically leads to better performance than in standard spread spectrum. 

13.5 Multiuser FHSS Systems 

Multiuser FHSS is accomplished by assigning each user a unique spreading code sequence s Ci (t) to generate its 
hop pattern. If the spreading codes are orthogonal and the users synchronized in time, then the different users 
never collide, and performance of each user is the same as in a single-user FH system. However, if the users 
are asynchronous or non-orthgonal codes arc used, then multiple users will collide by occuping a given channel 
simultaneously. The symbols transmitted during that time arc very likely to be in error, so multiuser FHSS typically 
uses error correction coding to compensate for collisions. 

Multiuser FHSS, also referred to as FH-CDMA or FH-SSMA, is mainly applied to uplink channels. This 
access method is the preferred method for military applications due to the anti -jam protection and low probability 
of interception and detection inherent to FH systems. FH-CDMA was also proposed as a candidate for second- 
generation digital cellular systems [2, Chapter 9.4], but was not adopted. Tradeoffs between FH and DS for multiple 
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Figure 13.18: Two-User Multicarrier CDMA System. 



access arc discussed in [41, 42] [5, Chapter 11]. In fact, most analyses indicate that FH-CDMA is inferior to DS- 
CDMA as a multiple access method in terms of the number of users that can share the channel simultaneously, at 
least under asynchronous operation. In addition, FH-CDMA systems typically cause interference at much larger 
distances than DS-CDMA systems, since the interference isn’t mitigated by bandwidth spreading [5, Chapter 11]. 
However, hybird techniques using FH along with another multiple access method are used to exploit FH benefits. 
For example, the GSM digital cellular standard uses a combination of time-division and slow frequency-hopping, 
where the frequency-hopping is used primarily to average out interference from other cells. FH-CDMA is also 
used in the Bluetooth system. Bluetooth operates in the unlicensed 2.4 GHz band, and FH was chosen since it can 
be used with non-coherent FSK modulation, which is a low-cost energy-efficient modulation technique. 
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Chapter 13 Problems 



1. In this problem we derive the SIR ratio (13.6) for a randomly spread signal with interference. The correlator 
output of the ith receiver branch in this system is given by (13.5) as 



Xi = 




N 

x(t)si(t)dt = ^(4 + IjSij). 

3 = 1 



(a) Show that the conditional expectation of x*, conditioned on the transmitted signal Si(t), is E[xi\si(t)] = 
E s . 



(b) Show that with equiprobable signaling, p(si(t)) = 1/M, that E[x/\ = E s /M. 

(c) Show that Var[xj|sj(f)] = E s Ej/N. 

(d) Show that again with equiprobably signaling, Var [xj] = E s Ej/(NM). 

(e) The SIR is given by 

SIR = E[X ' ]2 
Var [ Xi ] ‘ 



Show that 



SIR 



Es N_ 

Ej X AT 



2. Sketch the transmitted DSSS signal s(t)s c (t) over two bit times [0, 2Tb] assuming that s(t) is BPSK mod- 
ulated with carrier frequency 100 MHz and T s = 1 p sec. Assume the first data bit equals a one and the 
second data bit equals a zero. Also assume there are 10 chips per bit and the chips alternate between ±1, 
with the first chip equal to +1. 



3. Consider a FH system transmitted over a two-path channel, where the reflected path has delay r = 10 //sec 
relative to the LOS path. Assume the receiver is synchronized to the hopping of the LOS path. 



(a) For what hopping rates will the system exhibit no fading. 

(b) Assume a FFH system with hop time T c = 50 //sec and symbol time T s = .5 msec. Will this system 
exhibit no fading, flat-fading, or frequency-selective fading? 

(c) Assume a SFH system with hop time T c = 50 //sec and symbol time T s = .5 // sec. Does this system 
exhibit no fading, flat-fading, or frequency-selective fading? 



4. In this problem we explore the statistics of the DSSS receiver noise after it is multiplied by the spreading 
sequence. Let n(t) be a random noise process with autocorrelation function p n (r) and s c (t) a zero mean 
random spreading code, independent of n(t), with autocorrelation p c (r). Let n'(t) = n(t)s c (t). 

(a) Find the autocorrelation and PSD of n\t). 

(b) Show that if p c (r) = S(t), then n! (t) is zero mean with autocorrelation function p n {x), i.e. it has the 
same statistics as n(t), so the statistics of n(t) arc not affected by its multiplication with s c (t). 

(c) Find the autocorrelation p n '(r) of n'(t) if n(t) is zero-mean AWGN and s c (t) is a maximal linear code 
with autocorrelation given by (13.19). What happens to p n /(r) as N — > oo in (13.19)? 

5. Show that for any real periodic spreading code s c (t), its autocorrelation p c (r) over one period is symmetric 
about r and reaches its maximum value at r = 0. 
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6. Show that if s c (t) is periodic with period T, then the autocorrelation for time-shifted versions of the spread- 
ing code depends only on the difference of their time shifts: 

s c (t - r 0 )s c (t - Ti)dt = p c (Ti - T 0 ). 

7. Show that for any periodic spreading code s c (t) with period T, its autocorrelation p c (t) is periodic with the 
same period. 

8. Show that the power spectral density of s c (t) for a maximal linear spreading code with period NT C = T s is 
given by 

p scif) = Y2 ~W~ sin c2 ( m / N )ti 0 ~y)' 

m=—o o ' s ' 

Also plot this spectrum. 

9. Show that both m-scqucnccs and random binary spreading sequences have the balanced, run length, and 
shift properties. 

10. Suppose an umodulated carrier s(t) is spread using a maximal linear code s c (t) with period T and then 
transmitted over a channel with impulse response h(t) = ao 5(t — tq) + aod(t — t\). The corresponding 
received signal r(t) is input to the synchronization loop shown in Figure 13.9. Find the function w(t) 
output from the integrator in this loop as a function of r. What will determine which of these two multipath 
components the coast acquisition loop locks to? 

11. What is the outage probability relative to Pj, = 10~ 6 for a three-branch RAKE receiver with DPSK signal 
modulation, independent Rayleigh fading on each branch, and a branch SNR/bit prior to despreading of 10 
dB? Assume the code autocorrelation associated with maximal linear codes with K = N = 2" — 1 = 1-5. 
Assume also that the code in the first branch is perfectly aligned, but that the code in the second branch is 
offset by T c /4 and the code in the third branch is offset by T c / 3. Assume selection-combining diversity. 

12. Consider a spread spectrum signal transmitted over a multipath channel with a LOS component and a single 
multipath component, where the delay of the multipath relative to the LOS is greater than the chip time 
T c . Consider a 2 branch RAKE receiver, with one branch corresponding to the LOS component and the 
other to the multipath component. Assume that with perfect synchronization in both branches, the incoming 
signal component at each branch after despreading has power which is uniformly distributed between six 
and twelve milliwatts. The total noise power in the despread signal bandwidth is lmW. Suppose, however, 
that only the first branch is perfectly synchronized, while the second branch has a timing offset of T c f 2.366. 
The code autocorrelation is that of a maximal linear code, with N » 1. The two branches of the RAKE 
are combined using maximal ratio combining with knowledge of the timing offset. 

(a) What is the average SNR at the combiner output? 

(b) What is the distribution of the combiner output SNR? 

(c) What is the outage probability for DPSK modulation with a BER of 10 _4 ? 

13. This problem illustrates the benefits of RAKE receivers and the optimal choice of multipath components for 
combining when the receiver complexity is limited. Consider a multipath channel with impulse response 

h(t) = aoS(t) + a\5(t — n) + (* 28 (t - 72 ). 
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The a,: arc Rayleigh fading coefficients, but their expected power varies due to shadowing such that P\op} = 
5 with probability .5 and 10 with probability .5, E[a\] = 0 with probability .5 and 20 with probability .5, 
and E[at 2 \ = 5 with probability .75 and 10 with probability .25 (all units are linear). The transmit power 
and noise power are such that a spread spectrum receiver locked to the 7th multipath component will have an 
SNR of af in the absence of the other multipath components. 

(a) Assuming maximal linear codes, a bit time 7),, and a spread spectrum receiver locked to the LOS signal 
component (with zero delay and gain ao), for what values of n and t- 2 , 0 < t\ < t -2 < Ti, will their 
corresponding multipath components be attenuated by —1/N, where N is the number of chips per bit. 

For the rest of the problem assume spreading codes with autocorrelation equal to a delta function. 

(b) What is the outage probability of DPSK modulation at an instantaneous Pj, = 10 -3 for a single branch 
spread spectrum receiver locked to the LOS path. 

(c) What is the outage probability of DPSK modulation at an instantaneous P 5 = UP 3 for a 3-branch 
RAKE receiver where each branch is locked to one of the multipath components and SC is used to 
combine the paths. 

(d) Suppose receiver complexity is limited such that only a 2-branch RAKE with SC can be built. Find 
which two multipath components the RAKE should lock to in order to minimize the outage probability 
of DPSK modulation at 7), = 10 ~ 3 and find this minimum outage probability. 

14. This problem investigates the performance of a RAKE receiver when the multipath delays arc random. 
Consider a DS spreading code with chip time T c and autocorrelation function 

1 —T c /2 < t < T c /2 
0 else 

Suppose you use this spreading code to modulate a DPSK signal with bit time 7), = 10T C . You transmit the 
spread signal over a multipath channel, where the channel is modelled using Turin’s discrete-time tapped 
delay model (see Chapter 3 of Reader) with a tap separation of T c and a total multipath spread T m = 5 T c . 
Thus, the model has five multipath “bins”, where the /th bin has at most one multipath component of delay 
(i — .5 )T C . The distribution of the multipath component in each bin is independent of the components in 
all the other bins. The probability of observing a multipath component in bin i is .75 and, conditioned on 
having a multipath component in bin i, the amplitude of the 7th multipath component after despreading is 
Rayleigh distributed with average SNR/bit of Si = ^ , i = 1, 2, . . . , 5 (in linear units). Thus, the average 
power is decreasing relative to the distance that the multipath component has traveled. 

At the receiving end you have a five branch selection-diversity RAKE receiver with each branch synchro- 
nized to one of the multipath bins. Assuming a target BER of 10 3 , compute the outage probability of the 
RAKE receiver output. Compare this with the outage probability for the same BER if there is (with probabil- 
ity one) a multipath component in each bin, and each multipath component after despreading has an average 
SNR/bit of Si = 20 (linear units). 

15. Direct sequence spread spectrum signals are often used to make channel measurements, since the wideband 
spread spectrum signal has good resolution of the individual multipath components in the channel. Channel 
measurement with spread spectrum, also called channel sounding, is done using a receiver with multiple 
branches synch ionized to the different chip delays. Specifically, an unmodulated spreading sequence s c (t) 
is sent through the channel h(t), as shown in the figure below. The receiver has N + 1 branches synchronized 
to different chip delays. The output from the 7th branch approximates the channel gain associated with 
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that delay, so that the approximate channel model obtained by the channel sounding shown in the figure is 

h(t) = - iT c)- 




s c (t-NT c ) 



Assume that the autocorrelation function for s c (t) is 

, , 1 f Tb M / 1 - \t\/T c |t| < T c 

Pc(T)= ni Sc{tMt - T)dt = 1 0 |r| >T C 

(a) Show that if h(t) = EiI=o a i$(t — *E) then in the absence of noise (n(t) = 0) the channel sounder 
above will output 6 t = a, for all i. 

(b) Again neglecting noise, if h(t) = aS(t ) + bS(t — 1.2 T c ) + cd(t — 3.5 T c ), what approximation h(t) will 
be obtained by the channel sounder? 

(c) Now assume that the channel sounder yields a perfect estimate of the channel h(t) = h(t) = Po6(t) + 
f3\5{t—T c )+ (525{t—2T c ) , where the /3 t s are all indepedent Rayleigh fading random variables. Consider 
a 3 branch RAKE receiver with the zth branch perfectly synchronized to the ith multipath component 
of h(t), with an average SNR/bit on each branch after despreading of 10 dB. Find P out for DPSK 
modulation with a target BER of 1 0 :i under maximal -ratio combining in the RAKE. Do the same 
calculation for selection-combining in the RAKE. 

16. Find the values of the autocorrelation and crosscorrelation for Gold codes, Kasami codes from the small set, 
and Kasami codes from the large set for n = 8. Also, find the number of such Kasami codes for both the 
small set and the large set. 
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17. Find the Hadamard matrix for N = 4 and show that the spreading codes generated by the rows of this matrix 
arc orthogonal assuming synchronous users (i.e. show pij(0) = 0 Vi / j) . Also find the cross-correlation 
between all pairs of users assuming a timing offset of T c /2 between users, i.e. find 



Pij(T c /2 ) 




Sci{t)s Cj (t - T c /2)dt 



jj s Ci (nT c )s Cj (nT c - .5 T c ), 

n— 1 



for all pairs of codes. 

18. Consider an asynchronous DSSS MAC system with bandwidth expansion N = B s /B = 100 and K = 40 
users. Assume the system is interference-limited and there is no multipath on any user’s channel. Find the 
probability of error for user k under BPSK modulation, assuming random codes with the standard Gaussian 
assumption, and assuming this user is in a deep fade, with received power that is 6 dB less than the other 
users. Would this change if the users could be synchronized? 

19. Show that the vector r = (n, . . . , rx) T for r;. given by (13.54) can be expressed by the matrix equation 
(13.56). What arc the statistics of n in this expression? 

20. Show that the maximum-likelihood detector for a if -user synchronous MAC receiver choses the vector b to 
maximize the cost function given by (13.57). 

21. This problem illusrates the use of multiple spreading codes in single-user CDMA systems for adaptive 
modulation or diversity gain. The BER for user /;; in a K user DS-CDMA system where each user transmits 
his BPSK modulated bit sequence at a rate R b/s along his spreading code is given by: 



BERfc 



Q 



25' fc (7fc)7fc 



N Si=l,i^fc + 1 



(13.61) 



where N is the spreading factor (processing gain), is the i th user’s channel power gain, and is 

his transmit power when his channel gain is 7 j. Note that noise power has been normalized to unity an d 
the receiver demodulates each spreading sequence treating other sequences as noise (conventional receiver). 
The system has a single user that can to simultaneously transmit up to two spreading sequences, modulating 
each with an independent BPSK bit stream at rate R b/s (on each stream). 



(a) Assume that the user’s channel fade is 7 . Assume that the user splits his total transmit power S( 7 ) 
equally among the transmitted sequences. Note that the user has three options: he can transmit noth- 
ing, one BPSK modulated spreading sequence, or both spreading sequences BPSK modulated with 
independent bits. Based on the BER expression for the multiuser case (13.61), explain why the BER 
for the single user multirate DS-CDMA system is given by 

BER = Q ^ 25 ( 7 ) 7 ) (13.62) 

if he transmits only one spreading sequence and 



BER = Q (\Sl7Tf) 

when he transmits both spreading sequences together. What arc the rates achieved in both of these 
cases? 
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(b) Assume the channel is known perfectly to the transmitter as well as the receiver, and that 7 is distributed 
according to the distribution p( 7). We want to develop an adaptive rate and power strategy for this 
channel. Since we do not use error correction coding, the user needs to keep his BER below a threshold 
Pg for all transmitted bits. Assume an average transmit power constraint of unity: f" J 5 '(i)p('y)d'y = 
1 . Since we have a finite discrete set of possible rates, as with narrowband adaptive modulation the 
optimal adaptive rate policy is to send no data when 7 is below a cutoff threshold 70, one data stream 
when 70 < 7 < 71, and both data streams when 7 > 71. Find the power adaptation strategy that 
exactly meets the BER target P{‘ for this adaptive rate strategy as a function of the thresholds 70 and 
7 i- 

(c) Given the adaptive rate and power strategy obtained in paid (b), solve the Lagrangian optimization to 
find 70 and 71 as a function of the Lagrangian A. 

22 . You work for a company, WirelessToGo, that wants to design a fourth generation ( 4 G) cellular system for 
voice plus high speed data. The FCC has decided to allocate 100 MHz of spectrum for this system based on 
whatever standard is agreed to by the various industry players. You have been charged with designing the 
system and pushing your design through the standards body. You should describe your design in as much 
detail as possible, paying particular' attention to how your design will combat the impact of fading and ISI, 
as well as its ability to accommodate both voice and data. Also develop arguments as to why your design is 
better than competing strategies. 
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Chapter 14 

Multiuser Systems 



In multiuser systems the system resources must be divided among multiple users. This chapter develops techniques 
to allocate resources among multiple users, as well as the fundamental capacity limits of multiuser systems. We 
know from Chapter 5.1.2 that signals of bandwidth B and time duration T occupy a signal space of dimension 
2 BT. In order to support multiple users, the signal space dimensions of a multiuser system must be allocated to the 
different users 1 . Allocation of signaling dimensions to specific users is called multiple access 2 . Multiple access 
methods perform differently in different multiuser channels, and we will apply these methods to the two basic 
multiuser channels, downlink channels and uplink channels. Because signaling dimensions can be allocated to 
different users in an infinite number of different ways, multiuser channel capacity is defined by a rate region rather 
than a single number. This region describes all user rates that can be simultaneously supported by the channel with 
arbitrarily small error probability. We will discuss multiuser channel capacity regions for both the uplink and the 
downlink. We also consider random access techniques, whereby signaling dimensions arc only allocated to active 
users, as well as power control, which insures that users maintain the SINR required for acceptable performance. 
The performance benefits of multiuser diversity, which exploits the time- varying nature of the user’s channels, is 
also described. We conclude with a discussion of the performance gains and signaling techniques associated with 
multiple antennas in multiuser systems. 

14.1 Multiuser Channels: The Uplink and Downlink 

A multiuser channel refers to any channel that must be shared among multiple users. There arc two different types 
of multiuser channels: the uplink channel and the downlink channel, which arc illustrated in Figure 14.1. A down- 
link, also called a broadcast channel or forward channel, has one transmitter sending to many receivers. Since the 
signals transmitted to all users originate from the downlink transmitter, the transmitted signal s(t) = ^2k=i s k(t), 
with total power P and bandwidth B, is the sum of signals transmitted to all K users. Thus, the total signaling 
dimensions and power of the transmitted signal must be divided among the different users. Synchronization of 
the different users is relatively easy in the downlink since all signals originate from the same transmitter, although 
multipath in the channel can corrupt this synchronization. Another important characteristic of the downlink is that 
both signal and interference are distorted by the same channel. In particular - , user /r’s signal .s f,(t) and all interfering 
signals Sj(t),j / k pass through user k’ s channel hk(t) to arrive at user k’s receiver. This is a fundamental differ- 
ence between the uplink and the downlink, since in the uplink signals from different users are distorted by different 

'Allocation of signaling dimensions through either multiple access or random access is performed by the Medium Access Control layer 
in the Open Systems Interconnection (OS!) network model [1, Chapter 1.3]. 

2 The dimensions allocated to the different users need not be orthogonal, as in the superposition coding technique discussed in Sec- 
tion 14.5. 
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channels. Examples of wireless downlinks include all radio and television broadcasting, the transmission link from 
a satellite to multiple ground stations, and the transmission link from a base station to the mobile terminals in a 
cellular system. 

An uplink channel, also called a multiple access channel 3 or reverse channel, has many transmitters sending 
signals to one receiver, where each signal must be within the the total system bandwidth B. However, in contrast 
to the downlink, in the uplink each user has an individual power constraint associated with its transmitted signal 
Sk{t). In addition, since the signals are sent from different transmitters, these transmitters must coordinate if signal 
synchronization is required. Figure 14.1 also indicates that the signals of the different users in the uplink travel 
through different channels, so even if the transmitted powers Pf, arc the same, the received powers associated with 
the different users will be different if their channel gains are different. Examples of wireless uplinks include laptop 
wireless LAN cards transmitting to a wireless LAN access point, transmissions from ground stations to a satellite, 
and transmissions from mobile terminals to a base station in cellular systems. 
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Figure 14.1: Downlink and Uplink Channels. 

Most communication systems are bi-directional, and hence consist of both uplinks and downlinks. The radio 
transceiver that sends to users over a downlink channel and receives from these users over an uplink channel is 
often refered to as an access point or base station. It is generally not possible for radios to receive and transmit 
on the same frequency band due to the interference that results. Thus, bi-directional systems must separate the 
uplink and downlink channels into orthogonal signaling dimensions, typically using time or frequency dimensions. 
This separation is called duplexing. In particular, time-division duplexing (TDD) assigns orthogonal timeslots to a 
given user for receiving from an access point and transmitting to the access point, and frequency-division duplexing 
(FDD) assigns separate frequency bands for transmitting to and receiving from the access point. An advantage of 
TDD is that bi-directional channels arc typically symmetrical in their channel gains, so channel measurements 
made in one direction can be used to estimate the channel in the other direction. This is not necessarily the case 
for FDD in frequency-selective fading: if the frequencies assigned to each direction arc separated by more than the 
coherence bandwidth associated with the channel multipath, then these channels will exhibit independent fading. 

3 Note that multiple access techniques must be applied to both multiple access channels, i.e. uplinks, as well as to downlinks 
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14.2 Multiple Access 



Efficient allocation of signaling dimensions between users is a key design aspect of both uplink and downlink 
channels, since bandwidth is usually scarce and/or very expensive. When dedicated channels arc allocated to 
users it is often called multiple access 4 . Applications with continuous trans mi ssion and delay constraints, such 
as voice or video, typically require dedicated channels for good performance to insure their transmission is not 
interrupted. Dedicated channels are obtained from the system signal space using a channelization method such 
as time-division, frequency-division, code-division, or hybrid combinations of these techniques. Allocation of 
signaling dimensions for users with bursty transmissions generally use some form of random channel allocation 
which does not guarantee channel access. Bandwidth sharing using random channel allocation is called random 
multiple access or simply random access, which will be described in Section 14.3. In general, the choice of 
whether to use multiple access or random access, and which specific multiple or random access technique to 
apply, will depend on the system applications, the traffic characteristics of the users in the system, the performance 
requirements, and the characteristics of the channel and other interfering systems operating in the same bandwidth. 

Multiple access techniques divide up the total signaling dimensions into channels and then assign these chan- 
nels to different users. The most common methods to divide up the signal space are along the time, frequency, 
and/or code axes. The different user channels are then created by an orthogonal or non-orthogonal division along 
these axes: time-division multiple access (TDMA) and frequency-division multiple access (FDMA) are orthogo- 
nal channelization methods whereas code-division multiple access (CDMA) can be orthogonal or non-orthogonal, 
depending on the code design. Directional antennas, often obtained through antenna array processing, add an 
additional angular dimension which can also be used to channelize the signal space: this technique is called space- 
division multiple access (SDMA). The performance of different multiple access methods depends on whether they 
are applied to an uplink or downlink, and their specihc characteristics. TDMA, FDMA, and orthogonal CDMA 
are ah equivalent in the sense that they orthogonally divide up the signaling dimensions, and they therefore create 
the same number of orthogonal channels. In particular, given a signal space of dimension 2 BT, N orthogonal 
channels of dimension 2 BT/N can be created, regardless of the channelization method. As a result, ah multiple 
access techniques that divide the signal space orthogonally have the same channel capacity in AW GN, as will be 
discussed in Sections 14.5-14.6. However, channel impairments such as flat and frequency-selective fading affect 
these techniques in different ways, which lead to different channel capacities and different performance in practice. 

14.2.1 Frequency-Division Multiple Access (FDMA) 

In FDMA the system signaling dimensions are divided along the frequency axis into nonoverlapping channels, 
and each user is assigned a different frequency channel, as shown in Figure 14.2. The channels often have guard 
bands between them to compensate for imperfect filters, adjacent channel interference, and spectral spreading due 
to Doppler. If the channels are sufficiently narrowband then even if the total system bandwidth is large, the indi- 
vidual channels will not experience frequency-selective fading. Transmission is continuous over time, which can 
complicate overhead functions such as channel estimation since these functions must be performed simultaneously 
and in the same bandwidth as data transmission. FDMA also requires frequency-agile radios that can tune to the 
different carriers associated with the different channels. It is difficult to assign multiple channels to the same user 
under FDMA, since this requires the radios to simultaneously demodulate signals received over multiple frequency 
channels. FDMA is the most common multiple access option for analog communication systems, where transmis- 
sion is continuous, and serves as the basis for the AMPS and ETACS analog cellular phone standards [2, Chapter 
11.1]. Multiple access in OFDM systems, called OFDMA, implements FDMA by assigning different subcarriers 
to different users. 

4 An uplink channel is also referred to as a multiple access channel, however multiple access techniques are needed for both uplinks and 
downlinks. 
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Figure 14.2: Frequency-Division Multiple Access. 



Example 14.1: First-generation analog systems were allocated a total bandwidth of B = 25 MHz for uplink chan- 
nels and another B = 25 MHz for downlink channels. This bandwidth allocation was split between two operators 
in every region, so each operator had 12.5 MHz for both their uplink and downlink channels. Each user was as- 
signed B c = 30 KHz of spectrum for its analog voice signal, corresponding to 24 KHz for the FM modulated 
signal and 3 KHz guardbands on each side. The total uplink and downlink bandwidths also requred guard bands of 
B g = 10 KHz on each side to mitigate interference to and from adjacent systems. Find the total number of analog 
voice users that could be supported in the total 25 MHz of bandwidth allocated to the uplink and the downlink. 
Also consider a more efficient digital system with high-level modulation so that only 10 KHz channels arc required 
for a digital voice signal with tighter filtering such that only 5 KHz guard bands arc required on the band edges. 
How many users can be supported in the same 25 MHz of spectrum for this more efficient digital system? 



Solution: For either the uplink or the downlink, with guard bands on each side of the voice channel, each user 
requires a total bandwidth of B c + 2 B g . Thus, the total number of users that can be supported in the total uplink 
or downlink bandwidth B = 25 Khz is 



N = 



B - 2 Bg 

B c 



25 x 10 6 - 2 x 10 x 10 3 
30 x 10 3 



= 832, 



or 416 users per operator. Indeed, first-generation analog systems could support 832 users in each cell. The digital 
system has 



N = 



B - 2 Bg 

B c 



25 x 10 6 - 2 x 5 x 10 3 
10 x 10 3 



= 2599 



users that can be supported in each cell, almost a three -to Id increase over the analog system. The increase is 
primarily due to the bandwidth savings of the high-level digital modulation, which can accommodate a voice 
signal in one third the bandwidth of the analog voice signal. 
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14.2.2 Time-Division Multiple Access (TDMA) 



In TDMA the system dimensions arc divided along the time axis into nonoverlapping channels, and each user is 
assigned a different cyclically-repeating timeslot, as shown in Figure 14.3. These TDMA channels occupy the 
entire system bandwidth, which is typically wideband, so some form of ISI mitigation is required. The cyclically- 
repeating timeslots imply that transmission is not continuous for any user. Therefore, digital transmission tech- 
niques which allow for buffering arc required. The fact that transmission is not continuous simplifies overhead 
functions such as channel estimation, since these functions can be done during the timeslots occupied by other 
users. TDMA also has the advantage that it is simple to assign multiple channels to a single user by simply 
assigning him multiple timeslots. 

A major difficulty of TDMA, at least for uplink channels, is the requirement for synchronization among the 
different users. Specifically, in a downlink channel all signals originate from the same transmitter and pass through 
the same channel to any given receiver. Thus, for flat-fading channels, if users transmit on orthogonal timeslots the 
received signal will maintain this orthogonality. However, in the uplink channel the users transmit over different 
channels with different respective delays. To maintain orthogonal timeslots in the received signals, the different 
uplink transmitters must synchronize such that after transmission through their respective channels, the received 
signals arc orthogonal in time. This synchronization is typically coordinated by the base station or access point, 
and can entail significant overhead. Multipath can also destroy time-division orthogonality in both uplinks and 
downlinks if the multipath delays arc a significant fraction of a timeslot. TDMA channels therefore often have 
guard bands between them to compensate for synchronization errors and multipath. Another difficulty of TDMA 
is that with cyclically repeating timeslots the channel characteristics change on each cycle. Thus, receiver functions 
that require channel estimates, like equalization, must re-estimate the channel on each cycle. When transmission 
is continuous, the channel can be tracked, which is more efficient. TDMA is used in the GSM, PDC, IS-54, and 
IS- 136 digital cellular phone standards [2, Chapter 11]. 




Figure 14.3: Time-Division Multiple Access. 



Example 14.2: The original GSM design uses 25 MHz of bandwidth for the uplink and for the downlink, the same 
as AMPs. This bandwidth is divided into 125 TDMA channels of 200 KHz each. Each TDMA channel consists of 
8 user timeslots: the 8 timeslots along with a preamble and trailing bits form a frame, which is cyclically repeated 
in time. Find the total number of users that can be supported in the GSM system and the channel bandwidth of 
each user. If the rms delay spread of the channel is 10 //secs, will ISI mitigation be needed in this system? 
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Solution: Since there arc 8 users per channel and 125 channels, the total number of users that can be supported 
in this system is 125 x 8 = 1000 users. The bandwidth of each TDMA channel is 25 x 10 6 /125 = 200 KHz. 
A delay spread of 10 /rsecs corresponds to a channel coherence bandwidth of B c ~ 100 KHz, which is less than 
the TDMA channel bandwidth of 200 KHz. Thus, ISI mitigation is needed. The GSM specification includes an 
equalizer to compensate for ISI, but the type of equalizer is at the discretion of the designer. 



14.2.3 Code-Division Multiple Access (CDMA) 

In CDMA the information signals of different users are modulated by orthogonal or non-orthogonal spreading 
codes. The resulting spread signals simultaneously occupy the same time and bandwidth, as shown in Figure 14.4. 
The receiver uses the spreading code structure to separate out the different users. The most common form of 
CDMA is multiuser spread spectrum with either DS or FH, which arc described and analyzed in Chapters 1 3.4- 
13.5. 

Downlinks typically use orthogonal spreading codes such as Walsh-Hadamard codes, although the orthog- 
onality can be degraded by multipath. Uplinks generally use non-orthogonal codes due to the difficulty of user 
synchronization and the complexity of maintaining code orthogonality in uplinks with multipath [5]. One of the 
big advantages of non-orthogonal CDMA in uplinks is that little dynamic coordination of users in time or fre- 
quency is required, since the users can be separated by the code properties alone. In addition, since TDMA and 
FDMA carve up the signaling dimensions orthogonally, there is a hard limit on how many orthogonal channels 
can be obtained. This is also true for CDMA using orthogonal codes. However, if non-orthogonal codes are used, 
there is no hard limit on the number of channels that can be obtained. However, because non-orthogonal codes 
cause mutual interference between users, the more users that simultaneously share the system bandwidth using 
non-orthogonal codes, the higher the level of interference, which degrades the performance of all the users. A 
non-orthogonal CDMA scheme also requires power control in the uplink to compensate for the near-far effect. 
The near-far effect arises in the uplink because the channel gain between a user’s transmitter and the receiver is 
different for different users. Specifically, suppose that one user is very close to his base station or access point, 
and another user very far away. If both users transmit at the same power level, then the interference from the close 
user will swamp the signal from the far user. Thus, power control is used such that the received signal power of 
all users is roughly the same. This form of power control, which essentially inverts any attenuation and/or fading 
on the channel, causes each interferer to contribute an equal amount of power, thereby eliminating the near-far 
effect. CDMA systems with non-orthogonal spreading codes can also use MUD to reduce interference between 
users. MUD provides considerable performance improvement even under perfect power control, and works even 
better when the power control is jointly optimized with the MUD technique [6]. We will see in Sections 14.5-14.6 
that CDMA with different forms of multiuser detection achieves the Shannon capacity of both the uplink and the 
downlink, although the capacity-achieving transmission and reception strategies for the two channels are very dif- 
ferent. Finally, it is simple to allocate multiple channels to one user with CDMA by assigning that user multiple 
codes. CDMA is used for multiple access in the IS-95 digital cellular standards, with orthgonal spreading codes 
on the downlink and a combination of orthogonal and non-orthogonal codes on the uplink [2, Chapter 1 1.4]. It is 
also used in the W-CDMA and CDMA2000 digital cellular standards [4, Chapter 10.5]. 



Example 14.3: The SIR for a CDMA uplink with non-orthogonal codes under the standard Gaussian assumption 
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Figure 14.4: Code-Division Multiple Access. 



was given in (13.47) as 



SIR = 



3 G 

W^Ty 



where K is the number of users and G ~ 128 is the ratio of spread bandwidth to signal bandwidth. In IS -95 the 
uplink channel is assigned 1.25 MHz of spectrum. Thus, the bandwidth of the information signal prior to spread- 
ing is B s ~ 1.25 x 10 6 /128 = 9.765 KHz. Neglecting noise, if the required SINR on a channel is 10 dB, how 
many users can the CDMA uplink support? How many could be supported within the same total bandwidth for an 
FDMA system? 



Solution: To determine how many users can be supported, we invert the SIR expression to get 

3G 256 

K < h 1 = h 1 = 39.4, 

- SIR 20 

and since K must be an integer, the system can support 39 users. In FDMA we have 

L25x i 0« 

9.765 x 10 3 



so the total system bandwidth of 1.25 MHz can support 128 channels of 9.765 KHz. This calculation implies 
that FDMA is three times more efficient than non-orthogonal CDMA under the standard Gaussian assumption for 
code cross-correlation (FDMA is even more efficient under different assumptions about the code cross correlation). 
But in fact, IS-95 typically supports 64 users on the uplink and downlink by allowing variable voice compression 
rates depending on interference and channel quality and taking advantage of the fact that interference is not always 
present (called a voice-activity factor). While this makes CDMA less efficient than FDMA for a single cell, cellular 
systems have channel reuse, which can be done more efficiently in CDMA than in FDMA, as discussed in more 
detail in Chapter 15.2. 
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14.2.4 Space-Division 



Space-division multiple access (SDMA) uses direction (angle) as another dimension in signal space, which can 
be channelized and assigned to different users. This is generally done with directional antennas, as shown in 
Figure 14.5. Orthogonal channels can only be assigned if the angular separation between users exceeds the angular 
resolution of the directional antenna. If directionality is obtained using an antenna array, precise angular resolution 
requires a very large array, which may be impractical for the base station or access point and is certainly infeasible 
in small user terminals. In practice SDMA is often implemented using sectorized antenna arrays, discussed in 
Chapter 10.8. In these arrays the 360° angular range is divided into N sectors. There is high directional gain in 
each sector and little interference between sectors. TDMA or FDMA is used to channelize users within a sector. 
For mobile users SDMA must adapt as user angles change or, if directionality is achieved via sectorized antennas, 
then a user must be handed off to a new sector when it moves out of its original sector. 




Figure 14.5: Space-Division Multiple Access. 



14.2.5 Hybrid Techniques 

Many systems use a combination of different multiple access schemes to allocate signaling dimensions. OFDMA 
can be combined with tone hopping to improve frequency diversity [9]. DSSS can be combined with FDMA to 
break the system bandwidth into subbands. In this hybrid method different users arc assigned to different subbands 
with their signals spread across the subband bandwidth. Within a subband, the processing gain is smaller than it 
would be over the entire system bandwidth, so interference and ISI rejection is reduced. However, this technique 
does not require contiguous spectrum between subbands, and also allows more flexibility in spreading user signals 
over different size subbands depending on their requirements. Another hybrid method combines DS-CDMA with 
FH-CDMA so that the carrier frequency of the spread signal is hopped over the available bandwidth. This reduces 
the near-far effect since the interfering users change on each hop. Alternatively, TDMA and FH can be combined 
so that a channel with deep fading or interference is only used on periodic hops, so that the fading and interference 
effects can be mitigated by error correction coding. This idea is used in the GSM standard, which combines FH 
with its TDMA scheme to reduce the effect of strong interferes in other cells. 
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There has been much discussion, debate, and analysis about the relative performance of different multiple 
access techniques for current and future wireless systems, e.g. [8, 9, 10, 11, 12, 13]. While analysis and general 
conclusions can be made for simple system and channel models, it is difficult to come up with a definitive answer 
as to the best technique for a complex multiuser system under a range of typical operating conditions. Moreover, 
simplifying assumptions must be made to perform a comparative analysis or simulation study, and these assump- 
tions can bias the results in favor of one particular scheme. As with most engineering design questions, the choice 
of which multiple access technique to use will depend on the system requirements and characteristics along with 
cost and complexity constraints. 

14.3 Random Access 

Multiple access techniques are primarily for continuous-time applications like voice and video, where a dedicated 
channel facilitates good performance. However, most data users do not require continuous transmission: data is 
generated at random time instances, so dedicated channel assignment can be extremely inefficient. Moreover, most 
systems have many more total users (active plus idle users) than can be accommodated simultaneously, so at any 
given time channels can only be allocated to users that need them. Random access strategies are used in such 
systems to efficiently assign channels to the active users. 

All random access techniques are based on the premise of packetized data or packet radio. In packet radio 
user data is collected into packets of N bits, and once a packet is formed it is transmitted over the channel. 
Assuming a fixed channel data rate of R bps, the transmission time of a packet is r = N/R. The transmission 
rate It is assumed to require the entire signal bandwidth, and all users transmit their packets over this bandwidth, 
with no additional coding that would allow separation of simultaneously transmitted packets. Thus, if packets 
from different users overlap in time a collision occurs, in which case neither packet may be decoded successfully. 
Analysis of random access techniques typically assumes that collectively the users accessing the channel generate 
packets according to a Poisson process at a rate of A packets per unit time, i.e. A is the average number of packets 
that arrive in any time interval [0, t] divided by t. Equivalently, A N is the average number of bits generated in any 
time inteval [0, t] divided by t. For a Poisson process, the probability that the number of packet arrivals in a time 
period [0, t], denoted as X(t), is equal to some integer k is given by 

p{X{t) = k) = ^e- xt . (14.1) 

Poisson processes arc memoryless, so that the number of packet arrivals during any given time period does not 
affect the distribution of packet arrivals in any other time period. Note that the Poisson model is not necessarily a 
good model for all types of user traffic, especially Internet data, where bursty data causes correlated packet arrivals 
[14]. 

The traffic load on the channel given Poisson packet arrivals at rate A and packet transmission duration r is 
defined as L = At. If the channel data rate is R p packets per second then r = 1 / It r , = N/ It for It the channel 
data rate in bps. Note that L is unitless: it is the ratio of the packet arrival rate divided by the packet rate that can 
be transmitted over the channel at the channel’s data rate It. It L > 1 then on average more packets (or bits) arrive 
in the system over a given time period than can be transmitted in that period, so systems with L > 1 are unstable. 
If the transmitter is informed by the receiver about packets received in error and retransmits these packets, then 
the packet arrival rate A and corresponding load L = At is computed based on arrivals of both new packets and 
packets that require retransmission. In this case L is referred to as the total offered load. 

Performance of random access techniques is typically characterized by the throughput T of the system. The 
throughput, which is unitless, is defined as the ratio of the average number of packets successfully transmitted 
in any given time interval divided by the number of attempted transmissions in that interval. The throughput thus 
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equals the offered load multiplied by the probability of successful packet reception, T = /^(successful packet reception), 
where this probability is a function of the random access protocol in use as well as the channel characteristics, 
which can cause packet errors in the absence of collisions. If we assume that colliding packets always cause errors, 
then T < L, since no more than one packet can be successfully transmitted at any one time. Moreover, since a 
system with L > 1 is unstable, stable systems where colliding packets always cause errors have T < L < 1. 

Note that the throughput is independent of the channel data rate R, since the load and corresponding throughput 
are normalized with respect to this rate. This allows analysis of random access protocols to be generic to any 
underlying link design or channel capacity. For a packet radio with a link data rate of R bps, the effective data 
rate of the system is RT, since T is the fraction of packets or bits successfully transmitted at rate It. The goal of a 
random access method is to make T as large as possible in order to fully utilize the underlying link rates. Note that 
in some circumstances overlapping packets do not cause a collision. In particular, short periods of overlap between 
colliding packets, different channel gains on the received packets, and/or error correction coding can allow one or 
more packets to be successfully received even with a collision. This is called the capture effect [15, Chapter 4.3]. 

Random access techniques were pioneered by Abramson with the ALOHA protocol [16], where data is packe- 
tized and users send packets whenever they have data to send. ALOHA is very inefficient due to collisions between 
users, which leads to very low throughput. The throughput can be doubled by slotting time and synchronizing the 
users, but even then collisions lead to relatively low throughput values. Modifications to ALOHA protocols to 
avoid collisions and thereby increase throughput include carrier sensing, collision detection, and collision avoid- 
ance. Long bursts of packets can be scheduled to avoid collisions, but this typically takes additional overhead. 

In this section we will describe the various techniques for random access, their performance, and their design 
tradeoffs. 

14.3.1 Pure ALOHA 

In pure or unslotted ALOHA users transmit data packets as soon as they are formed. If we neglect the capture 
effect, then packets that overlap in time arc assumed to be received in error, and must be retransmitted. If we also 
assume packets that do not collide are successfully received (i.e. there is no channel distortion or noise), then the 
throughput equals the offered load times the probability of no collisions: T = Lp( no collisions). Suppose a given 
user transmits a packet of duration r during time [0, r]. Then if any other user generates a packet during time 
[— r, t], that packet, of duration r, will overlap with the transmitted packet, causing a collision. From (14.1), the 
probability that no packets are generated during the time [— r, r] is given by (14.1) with t = 2 r: 

p(X(t) = 0) = e~ 2Xr = e~ 2L , (14.2) 

with corresponding throughput 

T = Le~ 2L . (14.3) 

This throughput is plotted in Figure 14.6, where we see that throughput increases with offered load up to a max- 
imum throughput of approximately .18 for L = .5, after which point it decreases. In other words, the data rate 
is only 18% of what it would be with a single user transmitting continuously on the system. The reason for this 
maximum is that for small values of L there arc many idle periods when no user is transmitting, so throughput is 
small. As L increases, the channel is utilized more but collisions also start to occur. At L = .5 there is the opti- 
mal balance between users generating enough packets to utilize the channel with reasonable efficiency and these 
packet generations colliding infrequently. Beyond L = .5 the collisions become more frequent, which degrades 
throughput below its maximum, and as L grows very large, most packets experience collisions, and throughput 
approaches zero. 

Part of the reason for the inefficiency of pure ALOHA is the fact that users can start their packet transmissions 
at any time, and any partial overlap of two or more packets destroys the successful reception of all packets. By syn- 
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chronizing users such that all packet transmissions are aligned in time, the partial overlap of packet transmissions 
can be avoided. That is the basic premise behind Slotted ALOHA. 
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Figure 14.6: Throughput of Pure and Slotted ALOHA. 



14.3.2 Slotted ALOHA 

In slotted ALOHA, time is assumed to be slotted in timeslots of duration r, and users can only start their packet 
transmissions at the beginning of the next timeslot after the packet has formed. Thus, there is no partial overlap 
of transmitted packets, which increases throughput. Specifically, a packet transmitted over the time period [0, r] 
is successfully received if no other packets are transmitted during this period. This probability is obtained from 
(14.1) with t = t: p(X(t) = 0) = e~ L , with corresponding throughput 

T = Le~ L . (14.4) 

This throughput is also plotted in Figure 14.6, where we see that throughput increases with offered load up to a 
maximum throughput of approximately T = .37 for L = 1, after which point it decreases. Thus, slotted ALOHA 
has double the maximum throughput as pure ALOHA, and achieves this maximum at a higher offered load. While 
this represents a marked improvement over pure ALOHA, the effective data rate is still less than 40% of the raw 
transmission rate. This is extremely wasteful of the limited wireless bandwidth, so more sophisticated techniques 
are needed to increase efficiency. 

Note that slotted ALOHA requires synchronization of all nodes in the network, which can entail significant 
overhead. Even in a slotted system, collisions occur whenever two or more users attempt transmission in the same 
slot. Error control coding can result in collect detection of a packet even after a collision, but if the error correction 
is insufficient then the packet must be retransmitted. A study on design optimization between error collection and 
retransmission is described in [19]. 



Example 14.4: Consider a slotted ALOHA system with a transmission rate of R = 10 Mbps. Suppose packets 
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consist of 1000 bits. For what packet arrival rate A will the system achieve maximum throughput, and what is the 
effective data rate associated with this throughput? 

Solution: The throughput T is maximized for L = At = 1, where A is the packet arrival rate and r is the packet 
duration. With a 10 Mbps transmission rate and 1000 bits/packet, r = 1000/10 6 = .1 ms. Thus, A = 1/.0001 = 
10 1 packets per second maximizes throughput. The throughput for L = 1 is T = .37, so the effective data rate 
is TR = 3.7 Mbps. Thus, the data rate is reduced by roughly a factor of 3 as com pared to continuous data 
transmission due to the random nature of the packet arrivals and their corresponding collisions. 



14.3.3 Carrier Sense Multiple Access 

Collisions can be reduced by Carrier Sense Multiple Access (CSMA), where users sense the channel and delay 
transmission if they detect that another user is currently transmitting. To be effective, detection time and propaga- 
tion delays in the system must be small [3, Chapter 4.19]. Typically a user waits to transmit a random time period 
after sensing a busy channel. This random backoff avoids multiple users simultaneously transmitting as soon as 
the channel is free. CSMA only works when all users can detect each other’s trans mi ssions and the propagation 
delays arc small. Wired LANs have these characteristics, hence CSMA is paid of the Ethernet protocol. However, 
the nature of the wireless channel may prevent a given user from detecting the signals transmitted by all other 
users. This gives rise to the hidden terminal problem, illustrated in Figure 14.7, where each node can hear its 
immediate neighbor but no other nodes in the network. In this figure both node 3 and node 5 wish to transmit to 
node 4. Suppose node 5 starts his transmission. Since node 3 is too far away to detect this transmission, he assumes 
that the channel is idle and begins his transmission, thereby causing a collision with node 5 ’s transmission. Node 
3 is said to be hidden from node 5 since it cannot detect node 5’s transmission. ALOHA with CSMA also creates 
inefficiencies in channel utilization from the exposed terminal problem, also illustrated in Figure 14.7. Suppose 
the exposed terminal in this figure - node 2 - wishes to send a packet to node 1 at the same time node 3 is sending 
to node 4. When node 2 senses the channel it will detect node 3’s transmission and assume the channel is busy, 
even though node 3 does not interfere with the reception of node 2’s transmission by node 1. Thus node 2 will 
not transmit to node 1 even though no collision would have occurred. Exposed terminals only occur in multihop 
networks, so we will defer their discussion until Chapter 16. 
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Figure 14.7: Hidden and Exposed Terminals. 



The collisions introduced by hidden terminals are often avoided in wireless networks by a four-way handshake 
prior to transmission [20, 17]. This collision avoidance is done as follows. A node that wants to send a data packet 
will first wait for the channel to become available and then transmit a short RTS (Request To Send) packet. The 
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potential receiver, assuming it perceives an available channel, will immediately respond with a CTS (Clear To 
Send) packet that authorizes the initiating node to transmit, and also informs neighboring hidden nodes (i.e., nodes 
that arc outside the communication range of the transmitter but within the communication range of the receiver) 
that they will have to remain silent for the duration of the transmission. Nodes that overhear the RTS or CTS packet 
will refrain from transmitting over the expected packet duration. A node can only send an RTS packet if it perceives 
an idle channel and has not been silenced by another control packet. A node will only transmit a CTS packet if 
it has not been silenced by another control packet. The RTS/CTS handshake is typically coupled with random 
backoff to avoid all nodes transmitting as soon as the channel becomes available. In some incarnations [17, 18], 
including the 802.11 WLAN standard [4, Chapter 14.3], the receiver sends an ACK (Acknowledgement) packet 
back to the transmitter to verify when it has correctly received the packet, after which the channel again becomes 
available. 

Another technique to avoid hidden terminals is busy tone transmission. In this strategy users first check to see 
whether the transmit channel is busy by listening for a “busy tone” on a separate control channel [1, Chapter 4.6]. 
There is typically not an actual busy tone but instead a bit is set in a predetermined field on the control channel. 
This scheme works well in preventing collisions when a centralized controller can be “he aid” by users throughout 
the network. In a flat network without centralized control, more complicated measures are used to ensure that any 
potential interferer on the first channel can hear the busy tone on the second [21, 22]. Hybrid techniques using 
handshakes, busy tone transmission, and power control can also be used [22]. Collisions can also be reduced by 
combining DSSS with ALOHA. In this scheme each user modulates his signal with the same code, but if user 
transmissions arc separated by more than a chip time, the interference due to a collision is reduced by the code 
autocorrelation [23]. 

14.3.4 Scheduling 

Random access protocols work well with bursty traffic where there arc many more users than available channels, 
yet these users rarely transmit. If users have long strings of packets or continuous stream data, then random 
access works poorly as most transmissions result in collisions. In this scenario performance can be improved by 
assigning channels to users in a more systematic fashion through transmission scheduling. In scheduled access the 
available bandwidth is channelized into multiple time, frequency, or code division channels. Each node schedules 
its transmission on different channels in such a way as to avoid conflicts with neighboring nodes while making the 
most efficient use of the available signaling dimensions. 

Even with a scheduling access protocol, some form of ALOHA will still be needed since a predefined mech- 
anism for scheduling will be, by definition of random access, unavailable at startup. ALOHA provides a means 
for initial contact and the establishment of some form of scheduled access for the transmission of relatively large 
amounts of data. A systematic approach to this initialization that also combines the benefits of random access 
for bursty data with scheduling for continuous data is packet reservation multiple access (PRMA) [24]. PRMA 
assumes a slotted system with both continuous and bursty users (e.g. voice and data users). Multiple users vie for a 
given time slot under a random access strategy. A successful transmission by one user in a given timeslot reserves 
that timeslot for ah subsequent transmissions by the same user. If the user has a continuous or long transmission 
then after successfully capturing the channel he has a dedicated channel for the remainder of his transmission 
(assuming subsequent transmissions are not corrupted by the channel: this corruption causes users to lose their 
slots and they must then recontend for an unreserved slot, which can entail significant delay and packet dropping 
[25]). When this user has no more packets to transmit, the slot is returned to the pool of available slots that users 
attempt to capture via random access. Thus, data users with short transmissions benefit from the random access 
protocol assigned to unused slots, and users with continuous transmissions get scheduled periodic transmissions 
after successfully capturing an initial slot. A si mi lar technique using a combined reservation and ALOHA policy 
is described in [100]. 
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14.4 Power Control 



Power control is applied to systems where users interfere with each other. The goal of power control is to adjust 
the transmit powers of all users such that the SINR of each user meets a given threshold required for acceptable 
performance. This threshold may be different for different users, depending on their required performance. This 
problem is straightforward for the downlink, where both users and interferes have the same channel gains, but is 
more complicated in the uplink, where the channel gains may be different. Seminal work on power control for 
cellular systems and ad-hoc networks was done in [30, 31, 32], and power control for the the uplink is a special 
case for which these results can be applied. In the uplink model, the kth transmitter has a fixed channel power gain 
to the receiver. The quality of each link is determined by the SIR at the intended receiver. In an uplink with K 
interfering users we denote the SIR for the A:th user as 
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k = l,...,K, 



(14.5) 



where Pj~ is the power of the A th transmitter, n is the receiver noise power, and p is interference reduction due to 
signal processing. For example, in a CDMA uplink the interference power is reduced by the processing gain of the 
code, so p « 1 / G for G the processing gain, whereas in TDMA p = 1. 

Each link is assumed to have a minimum SIR requirement 7 ^ > 0. This constraint can be represented in 
matrix form with component-wise inequalities as 



(I — F)P > u with P > 0, 



(14.6) 



where P = ( P \ , f 2 ■ • . . , Pk) T is the column vector of transmitter powers, 
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is the column vector of noise power scaled by the SIR constraints and channel gain, and F is a matrix with 



Pkj — 



0 , if k = j 
^ if k *j 



(14.8) 



with k,j = 1, 2, . . . , K. 

The matrix F has non-negative elements and is irreducible. Let pp be the Perron- Frobenius eigenvalue of 
F. This is the maximum modulus eigenvalue of F, and for F irreducible this eigenvalue is simple, real, and 
positive. Moreover, from the Perron-Frobenius theorem and standard matrix theory [33], the following statements 
are equivalent: 

1. pF < 1 

2. There exists a vector P > 0 (i.e. Pp- > 0 for all k) such that (I — F)P > u 

3. (I — F) -1 exists and is positive componentwise. 

Furthermore, if any of the above conditions holds we also have that P* = (I — F)~ 'u is the Pareto optimal 
solution to (14.6). That is, if P is any other solution to (14.6) then P > P* componentwise. Hence, if the SIR 
requirements for all users can be met simultaneously, the best power allocation is P * so as to minimize the trans mi t 
power of the users. 
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In [32] the authors also show that the following iterative power control algorithm converges to P* when 
Pf < 1, and diverges to infinity otherwise. This iterative Foschini-Miljanic algorithm is given by 

P(i + 1) = FP(i) + u, (14.9) 

for * = 1,2,3,.... Furthermore, the above algorithm can be simplified to a per-user version as follows. Let 

P k (i + 1) = ^P k (i), (14.10) 

for each link k £ {1,2,..., A r }. Hence, each transmitter increases power when its SIR is below its target and 
decreases power when its SIR exceeds its target. SIR measurements or a function of them such as BER arc 
typically made at the base station or access points, and a simple “up” or “down” command regarding transmit 
power can be fed back to each of the transmitters to perform the iterations. It is easy to show that (14.9) and 
(14.10) arc pathwise equivalent and hence the per-user version of the power control algorithm also converges to 
P*. The feasible region of power vectors that achieve the SIR targets for a two-user system along with the iterative 
algorithms that converges to the minimum power vector in this region is illustrated in Figure 14.8. We see in this 
figure that the feasible region consists of all power pairs P = (Pi, P 2 ) that achieve a given pair of SIR targets, and 
the optimal pair P* is the minimum power vector in this two-dimensional region. 



Figure 14.8: Iterative Foschini-Miljanic Algorithm. 

The Foschini-Miljanic power control algorithm can also be combined with access control [28]. In this com- 
bination, access to the system is based on whether the new user causes other users to fall below their SINR targets. 
Specifically, when a new user requests access to the system, the base station or access point determines if a set 
of transmit powers exists such that he can be admitted without degrading existing users below their desired SINR 
threshold. If the new user cannot be accommodated in the system without violating the SINR requirements of 
existing users then he is denied access. If he can be accommodated then the power control algorithms of the new 
and existing users arc set to the feasible power vector under which all users (new and existing) meet their SINR 
targets. 

A power control strategy for multiple access that takes into account delay constraints is proposed and analyzed 
in [29]. This strategy optimizes the transmit power relative to both channel conditions and the delay constraints 
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via dynamic programming. The optimal strategy exhibits three modes: very low transmit power when the channel 
is poor and the tolerable delay large, higher transmit power when the channel and delay arc average, and very high 
transmit power when the delay constraint is tight. This strategy exhibits significant power savings over constant 
transmit power while meeting the delay constraints of the traffic. 

14.5 Downlink (Broadcast) Channel Capacity 

When multiple users share the same channel, the channel capacity can no longer be characterized by a single 
number. At the extreme, if only one user occupies all signaling dimensions in the channel then the region reduces 
to the single-user capacity described in Chapter 4. However, since there is an infinite number of ways to divide the 
channel between many users, the multiuser channel capacity is characterized by a rate region , where each point 
in the region is a vector of achievable rates that can be maintained by all the users simultaneously with arbitrarily 
small error probability. The union of achievable rate vectors under all multiuser transmission strategies is called 
the capacity region of the multiuser system. The channel capacity is different for uplink channels and downlink 
channels due to the fundamental differences between these channel models. However, the fact that downlink and 
uplink channels look like mirror-images of each other implies that there might be a connection between their 
capacities. In fact, there is a duality between these channels that allows the capacity region of either channel to 
be obtained from the capacity region of the other. Note that in the analysis of channel capacity the downlink is 
commonly refered to as the broadcast channel (BC) and the uplink is commonly refered to as the multiple access 
channel (MAC) 5 and we will use this terminology in our capacity discussions. In this section we describe the 
capacity region of the BC, Section 14.6 treats the MAC capacity region, and Section 14.7 characterizes the duality 
between these two channels and how it can be exploited in capacity calculations. 

After first describing the AWGN BC model, we will characterize its rate region using superposition code- 
division (CD) with successive interference cancellation, time-division (TD), and frequency-division (FD). We then 
obtain the rate regions using DSSS for orthogonal and non-orthogonal codes. The BC and corresponding capacity 
results under fading is also treated. 

We will see that capacity is achieved using superposition CD with interference cancellation. In addition, DSSS 
with successive interference cancellation has a capacity penalty relative to superposition coding which increases 
with spreading gain. Finally, spread spectrum with orthogonal CD can achieve a subset of the TD and FD capacity 
regions, but spread spectrum with non-orthogonal coding and no interference cancellation is inferior to all the other 
spectrum-sharing techniques. The capacity regions in fading depend on what is known about the fading channel at 
the transmitter and receiver, analogous to single-user capacity in fading. 

14.5.1 Channel Model 

We consider a BC consisting of one transmitter sending different data streams, also called independent information 
or data, to different receivers. Thus, our model is not applicable to a typical radio or TV broadcast channel, where 
the same data stream, also called common information or data, is received by all users. However the capacity results 
easily extend to include common data as described in Section 14.5.3. The capacity region of the BC characterizes 
the rates at which information can be conveyed to the different receivers simultaneously. We mainly focus on 
capacity regions for the two-user BC, since the general properties and the relative performance of the different 
spectrum-sharing techniques are the same for any finite number of users [35]. 

The two-user BC has one transmitter and two distant receivers receiving data at rate /?/,., k = 1,2. The channel 
power gain between the transmitter and A:th receiver is ry /,. , k = 1,2, and each receiver has AWGN of PSD Nq/2. 
We define the effective noise on the /rth channel as n/ ; . = A r (J ////., /;; = 1, 2, and we arbitrarily assume that ri\ < n^, 

S MAC is also used as an abbreviation for the medium access control layer in networks[l. Chapter 1.2]. 
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i.e. we assume the first user has a larger channel gain to its receiver than the second user. Incorporating the channel 
gains into the noise PSD does not change the SINR for any user, since the signal and interference on each user’s 
channel arc attenuated by the same channel gain. Thus, the BC capacity with channel gains { g /,. } is the same as the 
BC capacity based on the effective noises {rik} [41]. The fact that the channel gains or, equivalently, the effective 
noise of the users can be ordered makes the channel model a degraded broadcast channel, for which a general 
formula for channel capacity is known [34, Chapter 14.6]. We denote the transmitter’s total average power and 
bandwidth by P and B, respectively. 

If the transmitter allocates all the power and bandwidth to one of the users, then clearly the other user will 
receive arate of zero. Therefore, the set of simultaneously achievable rates (R i, P 2 ) includes the pairs {C \ , 0) and 
(0, C* 2 ), where 

6 ^)’ k=1 ’ 2 ’ <1411) 
is the single-user capacity in bps for an AWGN channel, as given in Chapter 4.1. These two points bound the BC 
capacity region. We now consider rate pairs in the interior of the region, which are achieved using more equitable 
methods of dividing the system resources. 



Ck — B log 2 ( 1 + 



14.5.2 Capacity in AWGN 



In this section we compute the set of achievable rate vectors of the AWGN BC under TD, FD, and the optimal 
method of superposition coding, which achieves capacity. In TD, the transmit power P and bandwidth B are 
allocated to user 1 for a fraction t of the total transmission time, and then to user 2 for the remainder of the 
transmission. This TD scheme achieves a straight line between the points C 1 and C 2 , corresponding to the rate 
pairs 
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This equal-power TD achievable rate region is illustrated in Figures 14.10 and 14.11. In these figures, n±B and 
ri 2 B differ by 3 dB and 20 dB, respectively. This dB difference, which reflects the difference in the channel 
gains of the two users, is a crucial parameter in comparing the achievable rates of the different spectrum-sharing 
techniques, as we discuss in more detail below. 

If we also vary the average transmit power of each user then we can obtain a larger set of achievable rates. Let 
Pi and P 2 denote the average power allocated to users 1 and 2, respectively, over their assigned time slots. The 
average power constraint then becomes tP\ + (1 — t)P 2 = P. The achievable rate region with TD and variable 
power allocation is then 



Ctd, vp = 



U 



(t,Pi,P2: 0<t<1; tPi+(1— t)P 2 =P} 



Ri = tB log 2 1 + 



Pi 

n 1 B 



, R 2 



(1 - t)B log 2 
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In FD the transmitter allocates P/, of its total power P and Bk of its total bandwidth B to user k. The power 
and bandwidth constraints require that Pi + P> = P and B\ + P 2 = B. The set of achievable rates for a fixed 
frequency division (P 1 , P 2 ) is thus 



Cffd = 



U 



{Pl,P 2 :Pi+P 2 =P} 



Pi = B\ log 2 



1 4 — ^ ) R 2 — B 2 log 2 



niBi) 



1 + 



—) 

P2P2/ 



(14.14) 



It was shown by Bergmans [35] that, for n\ strictly less than n 2 and any fixed frequency division (Pi, P 2 ), there 
exists a range of power allocations {Pi, P 2 : Pi + P 2 = P} whose corresponding rate pairs exceed a segment 
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of the equal-power TD line (14.12). This superiority is illustrated in Figures 14.10 and 14.11, where we also plot 
the rate regions for fixed FD under two different bandwidth divisions. The superiority is difficult to distinguish in 
Figure 14.10, where the users have similar channel gains, but is much more apparent in Figure 14.11, where the 
users have a 20 dB difference in gain. 

The FD achievable rate region is defined as the union of fixed FD rate regions (14.14) over all bandwidth 
divisions: 
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(14.15) 

It was shown in [35] that this achievable rate region exceeds the equal-power TD rate region (14.12). This superi- 
ority is indicated by the closure of the fixed FD regions in Figures 14. 10 and 14.11, although it is difficult to see in 
Figure 14. 10, where the users have a si mi lar received SNR. In fact, when ni = n 2 , (14. 15) reduces to (14. 12) [35]. 
Thus, optimal power and/or frequency allocation is more beneficial when the users have very disparate channel 
quality. 

Note that the achievable rate region for TD with unequal power allocation given by (14. 13) is the same as the 
FD achievable rate region (14.15). This is seen by letting P,; = r, B and tt, = t, P, in (14.13), where n = r and 
T 2 = 1 — r. The power constraint then becomes vri + 7 r 2 = P. Making these substitutions in (14.13) yields 
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Comparing this with (14.14) we see that with appropriate choice of P/, and 77 ,., any point in the FD achievable rate 
region can also be achieved through TD with variable power. 

Superposition coding with successive interference cancellation is a multiresolution coding technique whereby 
the user with the higher channel gain can distinguish the fine resolution of the received signal constellation, while 
the user with the worse channel can only distinguish the constellation’s coarse resolution [35] [34, Chapter 14.6]. 
An example of a two-level superposition code constellation taken from [37] is 32-QAM with embedded 4-PSK, 
as shown in Figure 14.9. In this example, the transmitted constellation point is one of the 32-QAM signal points 
chosen as follows. The data stream intended for the user with the worse channel, user 2 in our model since n 2 > n\, 
provides 2 bits to select one of the 4-PSK superpoints. The data stream intended for the user with the better SNR 
provides 3 bits to select one of the 8 constellation points surrounding the selected superpoint. After transmission 
through the channel, the user with the better SNR can easily distinguish the quadrant in which the constellation 
point lies. Thus, the 4-PSK superpoint is effectively subtracted out by this user. However, the user with the 
worse channel cannot distinguish between the 32-QAM points around its 4-PSK superpoints. Hence, the 32-QAM 
modulation superimposed on the 4-PSK modulation appeal's as noise to this user, and this user can only decode 
the 4-PSK. These ideas can be easily extended to multiple users using more complex signal constellations. Since 
superposition coding achieves multiple rates by expanding its signal constellation, it does not require bandwidth 
expansion. 

The two-user capacity region using superposition coding and successive interference cancellation was derived 
in [35] to be the set of rate pairs 
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The intuitive explanation for (14.17) is the same as for the example illustrated in Figure 14.9: Since n\ < n 2 , 
user 1 correctly receives all the data transmitted to user 2. Therefore, user 1 can decode and subtract out user 2’s 
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Figure 14.9: 32-QAM with embedded 4-PSK 



message, then decode its own message. User 2 cannot decode the message intended for user 1, since it has a worse 
channel. Thus, user l’s message, with power Pi, contributes an additional noise term to user 2’s received message. 
This message can be treated as an additional AW GN term since the capacity-achieving distributions for the signals 
associated with each user are Gaussian [34, Chapter 14. 1] [35] . This same process is used by the successive 
interference cancellation method for DSSS described in Chapter 13.4.4. However, although successive interference 
cancellation achieves the capacity region (14.17), it is not necessarily the best method to use in practice. The 
capacity analysis assumes perfect signal decoding, whereas real systems exhibit some decoding error. This error 
leads to decision-feedback errors in the successive interference cancellation scheme. Thus, multiuser detection 
methods that do not suffer from this type of error may work better in practice than successive cancellation. 

The rate region defined by (14.17) was shown in [36] to exceed the regions achievable through either TD or 
FD, when ni < re . Moreover, it was also shown in [36] that this is the maximum achievable set of rate pairs 
for any type of coding and spectrum sharing, and thus (14.17) defines the BC capacity region, hence the notation 
Cbc • However, if the users all have the same SNR, then this capacity region collapses to the equal -power TD line 
(14.12). Thus, when n\ = n 2 , all the spectrum-sharing methods have the same rate region. 

The ideas of superposition coding are easily extended to a I\ -user system for I\ > 2. Assume a BC with K 
users, each with channel gain g\-. We first order the users relative to their effective noise n /. = . 5 A)j ////,.. Based 
on this effective noise ordering, the superposition coding will now have K levels, where the coarsest level can 
be detected by the user with the largest effective noise, the next level can be detected by the user with the next 
largest effective noise, and so forth. Each user can remove the effects of the constellation points associated with 
the noisier channels of other users, but the constellation points transmitted to users with better channels appear as 
noise. Assuming a total power constraint P, the multiuser extension to the two-user region (14.17) is given by 



Cbc = 



U 



(i?l, . . . , Rk) '• Rk — B log2 1 + 



Pk 



nkB + Ylf= i Pj 1 K > n j\ 






Pk=p} 



(14.18) 



where 1 [-] denotes the indicator function. 

We define the sum-rate capacity of a BC as the maximum sum of rates taken over all rate vectors in the 
capacity region: 

K 

Cbcsr= m ax Y ^Rk- (14.19) 

( Ru-,Rk)€C B c “ 
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Sum-rate capacity is a single number that defines the maximum throughput of the system regardless of fairness 
in terms of rate allocation between the users. It is therefore much easier to characterize than the K -dimensional 
capacity region, and often leads to important insights. In particular, it can be shown from (14.18) that sum-rate 
capacity is achieved on the AWGN BC by assigning all power P to the user with the highest channel gain or, 
equivalently, the lowest effective noise. Defining n m? ; n = rriirq, n k and g max = max/,. g k , this implies that the 
sum-rate capacity Cbcsr f° r the AT- user AWGN BC is given by 

Cbcsr = B log 2 ( 1 + — ^-] = B log 2 ( 1 + ■ (14.20) 

V riminB ) V N oB ) 

The sum-rate point is therefore one of the boundary points (14.11) of the capacity region, which is the same for 

superposition coding, TD, and FD, since all resources arc assigned to a single user. 



Example 14.5: Consider an AWGN BC with total transmit power P = 10 mW, m = 10~ 9 W/Hz, n 2 = 10 -8 
W/Hz, and B = 100 KHz. Suppose user 1 requires a data rate of 300 Kbps. Find the rate that can be allocated to 
user 2 under fixed power TD, equal-bandwidth FD, and superposition coding. 

Solution: In equal-power time division user 1 has a rate of R i = tB log 2 ^1 + j = 6.644 x 10 5 r bps. Setting 
R\ to the desired value R\ = 6.644 x 10 5 r = 3 x 10 5 bps and solving for r yields r = 3 x 10 5 /6.644 x 10 5 = .452. 
Then user 2 gets a rate of i? 2 = (1 — t)B log 2 ^1 + j = \ gg x io 5 bps. In equal-bandwidth FD we require 

R\ = .5F>log 2 ^1 + 5 ^ib ) = 3xl0 5 bps. SolvingforPi = .5niB(2 Rl ^- 5B ^ — 1) yields Pi = 3.15 mW. Setting 
P 2 = P — P\ = 6.85 mW, we get i? 2 = .5Rlog 2 ^1 + ^) 2 b ) = x ^P s - Fi na lly' with superposition 
coding we have R,\ = B log 2 ^1 + - = 3 x 10 5 . Solving for P\ = n\B{ 2 Rl ^ B — 1) yields P\ = .7 mW. Then 

R 2 = B log ( 1 + P ) = 2.69 x 10 5 bps. 

V n 2 B + Pi) 

Clearly superposition coding is superior to both TD and FD, as expected, although the performance of these 
techniques would be closer to that of superposition coding if we optimized the power allocation for TD or the 
bandwidth allocation for FD. 



Example 14.6: Find the sum-rate capacity for the system in the prior example. 

Solution: We have P = 10 mW, m = 10 -9 W/Hz, n 2 = 1CP 8 W/Hz, and B = 100 KHz. The minimum noise is 
associated with user 1, n m i n = 10~ 9 . Thus, Cbcsr = B logo ( 1 + - — ) = 6.644 x 10 5 bps, and this sum-rate 

^ \ ''"min -O J 

is achievable with TD, FD, or superposition coding, which are all equivalent for this sum-rate capacity since all 
resources are allocated to the first user. 



CD for multiple users can also be implemented using DSSS, as discussed in Chapter 13.4. In such systems the 
modulated data signal for each user is modulated by a unique spreading code, which increases the transmit signal 
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bandwidth by approximately G , the processing gain of the spreading code. For orthogonal spreading codes, the 
cross correlation between the respective codes is zero, and these codes require a spreading gain of N to produce N 
orthogonal codes. For a total bandwidth constraint B, the information bandwidth of each user’s signal with these 
spreading codes is thus limited to B/N. The two-user achievable rate region with these spreading codes is then 



Cds,oc = [J 

{Pl,P 2 :Pl+P 2 =P} 



R 1 = J k>g 2 



Pi \ n B , 

1 H rVTTT I , R 2 — log 2 



n\B/2 ) 



1 + 



P-2 



n- 2 B/2 ) 



(14.21) 



Comparing (14.21) with (14. 14) we see that CD with orthogonal coding is the same as fixed FD with the bandwidth 
equally divided (B i = B- 2 = B/2). From (14. 16), TD with unequal power allocation can also achieve all points in 
this rate region. Thus, orthogonal CD with Walsh-Hadamard codes achieves a subset of the TD and FD achievable 
rate regions. More general orthogonal codes are needed to achieve the same region as these other techniques. 

We now consider DSSS with non-orthogonal spreading codes. As discussed in Chapter 13.4.2, in these 
systems interference between users is attenuated by the code cross correlation. Thus, if interference is treated as 
noise, its power contribution to the SIR is reduced by the square of the code cross correlation. From (13.6), we 
will assume that spreading codes with a processing gain of G reduce the interference power by 1/G. This is a 
reasonable approximation for random spreading codes, although as discussed in Chapter 14 the exact value of the 
interference power reduction depends on the nature of the spreading codes and other assumptions [38, 39]. Since 
the signal bandwidth is increased by G, the two-user BC rate region achievable through non-orthogonal DSSS and 
successive interference cancellation is given by 



Cds,sc,ic 



U 



{Pl,P 2 : Pl+P2=P} 






mB/GJ 



R-2 





n 2 B/G + P\/G ) ) 



(14.22) 

By the convexity of the log function, the rate region defined by (14.22) for G > 1 is smaller than the rate region 
(14.17) obtained using superposition coding, and the degradation increases with increasing values of G. This 
implies that for nonorthogonal coding, the spreading gain should be minimized in order to maximize capacity. 

With non-orthogonal coding and no interference cancellation, the receiver treats all signals intended for other 
users as noise, resulting in the achievable rate region 



C0S ’ SC {Pi P2: U +Pa=p} i Rl G bg2 i 1 + niB/G + P 2 /g) ’ R2 G l ° g2 0 + n 2 B/G + P 1 /g) ) 

(14.23) 

Again using the log function convexity, G = 1 maximizes this rate region, and the rate region decreases as G 
increases. Moreover, the radius of curvature for (14.23) is given by 



R\R 2 — R 2 R\ 

+ 2 ' 



(14.24) 



where Ri and R, denote, respectively, the first and second derivatives of R, with respect to a for Pi = aP 
and P 2 = (1 — a) P. For G = 1, y > 0. Thus, the rate region for nonorthogonal coding without interference 
cancellation (14.23) is bounded by a convex function with end points C\ and C 2 , as shown in Figures 14.10 
and 14.11. Therefore, the achievable rate region for nonorthogonal CD without interference cancellation will lie 
beneath the regions for TD and FD, which are bounded by concave functions with the same endpoints. 

The acheivable rate regions for equal-power TD (14. 12), FD (14. 14), orthogonal CD (14.21), and nonorthogo- 
nal CD with (14.17) and without (14.23) interference cancellation arc illustrated in Figures 14.10 and 14.11, where 
the SNR between the users differs by 3 dB and 20 dB, respectively. For the calculation of (14.23) we assume CD 
through superposition coding with G = 1. Spread spectrum CD with larger values of the spreading gain will result 
in a smaller rate region. 



442 




14.5.3 Common Data 



In many broadcasting applications common data is sent to all users in the system. For example, television and radio 
stations broadcast the same data to all users, and in wireless Internet applications many users may want to download 
the same stock quotes and sports scores. The nature of superposition coding makes it straightforward to develop 
optimal broadcasting techniques for common data and to incorporate common data into the capacity region for the 
BC. In particular, for a two-user BC with superposition coding, the user with the better channel always receives 
the data intended for the user with the worse channel, along with his own data. Thus, since common data must be 
transmitted to both users, we can encode all common data as independent data intended for the user with the worse 
channel. Since the user with the better channel will also receive this data, it will be received by both users. 

Under this transmission strategy, if the rate pair ( R \ . R 2 ) is in the capacity region of the two-user BC with 
independent data defined by (14.17), for any Rq < R 2 we can achieve the rate triple (Rq, II \ . R 2 — Ro) for the BC 
with common and independent data, where Rq is the rate of common data, R\ is the rate of user l’s independent 
data, and R 2 — R 0 is the rate of user 2’s independent data. Mathematically, this gives the three-dimensional capacity 
region 



Cbc = 

IJ (R 0 <Blog 2 

{P 1 ,P 2 :Pl+P 2 =P} 's 




^ ) 
n 2 B + PiJ 



, R\ 



B log 2 




, R2 



B log 2 




A ) 

n 2 B + PiJ 




(14.25) 



Example 14.7: In Example 14.5 we saw that for a broadcast channel with total transmit power P = 10 mW, 
m = ICC 9 W/Hz, n 2 = 10 -8 W/Hz, and B = 100 KHz, the rate pair (R\, R 2 ) = (3 x 10 5 , 2.69 x 10 5 ) is on the 
boundary of the capacity region. Suppose user 1 desires an independent data rate of 300 Kbps, and a common data 
rate of 100 Kbps is required for the system. At what rate can user 2 get independent data? 

Solution: In order for R\ = 300 Kbps, we require the same P\ = .7 mW as in Example 14.5.2. The common 
information rate Rq = 10 5 < 2.69 x 10 5 , so from (14.25), the independent information rate to user 2 is just 
R 2 - R 0 = 2.69 x 10 5 - 10 5 = 1.69 x 10 5 bps. 



14.5.4 Capacity in Fading 

We now consider the capacity region of BCs with fading, where the users have independent random channel 
gains that change over time. As described in Chapter 4.2 for single-user channels, the capacity of fading BCs 
depends on what is known about the channel at the transmitter and receiver. However, capacity of a BC is only 
known for degraded BCs, and this model requires that the channel gains arc known to both the transmitter and 
receiver. Moreover, superposition coding cannot be used without transmitter knowledge of the channel gains, 
since if the transmitter does not know the relative channel gains it does not know which user can receive the coarse 
constellation point and which can receive the fine one. Thus we will only consider fading BCs where there is 
perfect channel side information (CSI) about the instantaneous channel gains at both the transmitter and at all 
receivers. We also assume that the channel is slowly fading so that for a given fading state, the coding strategy 
that achieves any point in the capacity region for the static BC with this state has sufficient time to drive the error 
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probability close to zero before the channel gains change. 6 

As with the single-user fading channel, there arc two notions of capacity for multiuser fading channels with 
perfect CSI: ergodic (Shannon) capacity and outage capacity. The ergodic capacity region of a BC characterizes the 
achievable rate vectors averaged over all fading states [40, 41] while the outage capacity region dictates the set of 
fixed rate vectors that can be maintained in all fading states subject to a given outage probability [42, 43, 44]. Zero- 
outage capacity refers to outage capacity with zero outage probability [42], i.e. the set of fixed rate vectors that 
can be maintained in all fading states. The ergodic capacity region, analogous to ergodic capacity for single-user 
systems, defines the data rate vectors that can be maintained over time without any constraints on delay. Hence, in 
some fading states, the data rate may be small or zero, which can be problematic for delay-constrained applications 
like voice or video. The outage capacity region, analogous to outage capacity in single-user systems, forces a fixed 
rate vector in all nonoutage fading states, which is perhaps a more appropriate capacity metric for delay-constrained 
applications. However, the requirement to maintain a fixed rate even in very deep fades can severely decrease the 
outage capacity region relative to the ergodic capacity region. In fact, the zero-outage capacity region when all 
users exhibit Rayleigh fading is zero for all users. 

We consider a BC with AW GN and fading where a single transmitter communicates independent information 
to K users over bandwidth B with average transmit power P. The transmitter and all receivers have a single 
antenna. The time- varying power gain of user A’s channel at time i is gk[i\- Each receiver has AWGN with PSD 
No/ 2. We define the effective time-varying noise of the Ath user as 7 n/ ; . [?'] = Aq /<■//. [i] . The effective noise vector 
at time i is defined as 

n[i] = (ni[i\, . . .,n K [i])- (14.26) 



We also call this the fading state at time i, since it characterizes the channel gains g /,. [i] associated with each user 
at time i. We will denote the A’th element of this vector as n/. [z] or just n/, : when the time reference is clear. As 
with the static channel, the capacity of the fading BC can be computed based on its time- varying channel gains or 
its time-varying effective noise vector. The ergodic BC capacity region is defined as the set of all average rates 
achievable in a fading channel with arbitrarily small probability of error, where the average is taken with respect 
to all fading states. In [41], the ergodic capacity region and optimal power allocation scheme for the fading BC 
is found by decomposing the fading channel into a parallel set of static BCs, one for every possible fading state 
n = (No/gi , . . . , Nq/qk)- In each fading state, the channel can be viewed as a static AWGN BC, and time, 
frequency, or code division techniques can be applied to the channel in each fading state. 

Since the transmitter and all receivers know n[i], superposition coding according to the ordering of the current 
effective noise vector can be used by the transmitter. Each receiver can perform successive decoding in which the 
users with larger effective noise are decoded and subtracted off before decoding the desired signal. Furthermore, 
the power transmitted to each user Pj( n) is a function of the current fading state. Since the transmission scheme 
is based on supeiposition coding, it only remains to determine the optimal power allocation across users and over 
time. 



We define a power policy V over all possible fading states as a function that maps from any fading state n to 
the transmitted power P^( n) for each user. Let Tbc denote the set of all power policies satisfying average power 
constraint P : 



Nbc 



jp : E 



n 



r k 



^2 p k{n) 

_k = 1 




(14.27) 



From (14. 18), the capacity region assuming a constant fading state n with power allocation P(n) = { P*.(n) : A = 



6 More precisely, the coding strategy that achieves a point in the AWGN BC capacity region uses a block code, and the error probability 
of the code goes to zero with blocklength. Our slow fading assumption presumes that the channel gains stay constant long enough for the 
block code associated with these gains to drive the error probability close to zero. 

7 Notice that the noise vector is the instantaneous power of the noise and not the instantaneous noise sample. 



445 




1, . . . , K} is given by 



C sc (P(n)) = ( (i?i(P(n), . . . , R K { P(n)) : R k { P(n)) = B log 2 ( 1 + ) ) 

l V r, k B + Ef=! Pj (n) 1 [n fc > rij\ ) J 

(14.28) 

Let Cbc{P) denote the set of achievable rates averaged over all fading states for power policy V: 

Cbc{P) = {Rk ■ Rk < E n [i?fc(P(n))] , k = 1,2, . . . ,I<} 

where f?fc(P(n)) is as given in (14.28). From [41], the ergodic capacity region of the BC with perfect CSI and 
power constraint P is: 

Cbc(P)= U C bc(V). (14.29) 

VgTbc 

It is further shown in [41] that the region Cbc(P) is convex, and that the optimal power allocation scheme is an 
extension of water-filling with K different water-levels for a /v -user system. 

We can also define achievable rate vectors for TD or FD, although these will clearly lie inside the ergodic 
capacity region, since superposition coding outperforms both of these techniques in every fading state. The optimal 
form of TD adapts the power assigned to each user relative to the current fading state. Similarly, the optimal form 
of FD adapts the bandwidth and power assigned to each user relative to the current fading state. As described 
in Section 14.5.2, for each fading state varying the power in TD yields the same rates as varying the power and 
bandwidth in FD. Thus, the achievable rates for these two techniques averaged over all fading states are the same. 
Focusing on the FD region, assume a power policy V G Tbc that assigns power P k ( n) to the A th user in fading 
state n. From (14.27), a power policy V G Tbc satisfies the average power constraint. Also assume a bandwidth 
policy B that assigns bandwidth B k ( n) to user k in state n and let Q denote the set of all bandwidth policies 
satisfying the bandwidth constraint of the system: 

Q= U : ^B k (n) =B Vn 
l k= 1 

The set of achievable rates for FD under these policies is 

Cfd(P, B) = {R k ■ R k < E n [i?fc(P(n), B )\ , k = 1, 2, . . . , K} , (14.30) 

where 

i«P(n),B) = f«n) log^l-fT^) ,1431) 

The set of all achievable rates under frequency division with perfect CSI subject to power constraint P and band- 
width constraint B is then 

C fd (P,B)= J C fd (V,B). (14.32) 

V&FbcP&Q 

The sum-rate capacity for fading BCs is defined as the maximum sum of achievable rates, maximized over all 
rate vectors in the ergodic BC capacity region. Since sum-rate for the AW GN BC is maximized by transmitting 
only to the user with the best channel, in fading sum-rate is maximized by transmitting only to the user with 
the best channel in each channel state. Clearly superposition CD, TD, and FD arc all equivalent in this setting, 
since all resources are assigned to a single user in each state. We can compute the sum-rate capacity and the 
optimal power allocation over time from an equivalent single-user fading channel with time-varying effective noise 
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n[i] = minfc ny. [ 2 ] and average power constraint P. From Chapter 4.2.4 the optimal power allocation to the user 
with the best channel at time i is thus a water-filling in time, with cutoff value determined from the distribution of 

min k n k [i\. 

The ergodic capacity and achievable rate regions for fading broadcast channels under CD, TD, and FD arc 
computed in [41] for different fading distributions, along with the optimal adaptive resource allocation strategies 
that achieve the boundaries of these regions. These adaptive transmission policies exploit multiuser diversity in 
that more resources (power, bandwidth, timeslots) arc allocated to the users with the best channels in any given 
fading state. In particular, sum-rate capacity is achieved by allocating all resources in any given state to the user 
with the best channel. Multiuser diversity will be discussed in more detail in Section 14.8. 

The zero-outage BC capacity region defines the set of rates that can be simultaneously achieved for all users in 
all fading states while meeting the average power constraint. It is the multiuser extension of zero-outage capacity 
defined in Chapter 4.2.4 for single-user channels. From [43], the power required to support a rate vector R = 
(Ri, R 2 , • • • , Rk ) in fading state n is: 



K — 1 

P min ( R, n) = \^f=k+i R*u)/ B ( 2 r -w/b _ ^ nn{k) B + ^ n ^ R) B, (14.33) 



k = t 



where 7r(.) is the permutation such that 



Therefore the zero-outage capacity region is the union of all rate vectors that meet the average power constraint: 

C°bc(P)= U R=(Ri,R2,...,Rk). (14.34) 

{R:E n [P m ™(R,n)]<P} 

The boundary of the zero-outage capacity region is the set of all rate vectors R such that the power constraint is 
met with equality. For the two-user BC with time-varying AWGN with powers ni and 712 , this boundary simplifies 
to the set of all (R\, R 2 ) that satisfy the following equation [43]: 



P = p{n\ < n 2 ) 



E[m|m < n 2 ]2 R2/B {2 Rl / B - 1) + E[n 2 |ni < n 2 ]( 2 R2/B - 1) 



+ 



p(ni > n 2 ) 



E[n 2 |ni > n 2 ]2 Rl/B {2 R2/B - 1) + E[m|m > n 2 ]( 2 Rl/B - 1) 



The boundary is determined solely by E[ni|ni < 712 ], E^Itj-i < n 2 \, E[ni|ni > 772 ], and £[ 77 - 2 |rii > 77 . 2 ] • This 
is due to the fact that the power required to achieve a rate vector is a linear function of the noise levels in each 
state, as seen in (14.33). The zero-outage capacity region depends on the conditional expectations of the noises as 
opposed to their unconditional expectations since every different ordering of noises leads to a different expression 
for the required power in each state, as can be seen from (14.33). 

The outage capacity region of the BC is defined similarly as the zero-outage capacity region, except that 
users may have some nonzero probability of outage so that they can suspend transmission in some outage states. 
This provides additional flexibility in the system since under severe fading conditions, maintaining a fixed rate in 
all fading states can consume a great deal of power. In particular, we saw in Chapter 4.2 that for a single-user 
fading channel, maintaining any non-zero fixed rate in Rayleigh fading requires infinite power. By allowing some 
outage, power can be conserved from outage states to maintain higher rates in non-outage states. The outage 
capacity region is more difficult to obtain than the zero-outage capacity region, since in any given fading state the 
transmission strategy must determine which users to put into outage. Once the outage users are determined, the 
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power required to maintain the remaining users is given by (14.33) for the rate vector associated with the K' < K 
users that arc not in outage. It is shown in [43] that this decision should be made based on a threshold policy, 
and the resulting outage capacity region is then obtained implicitly based on the threshold policy and the power 
allocation (14.33) for non-outage users. 

The notions of ergodic capacity and outage capacity can also be combined. This combination results in the 
minimum rate capacity region [46]. A rate vector in this region characterizes the set of all average rate vectors that 
can be maintained, averaged over all fading states, subject to some minimum rate vector that must be maintained 
in all states (possible subject to some outage probability). Minimum rate capacity is useful for systems supporting 
a mix of delay-constrained and delay-unconstrained data. The minimum rates dictate the data rates available for 
the constrained data that must be maintained in all fading states, while the rates above these mi nimums arc what 
is available for the unconstrained data, where these additional rates vary depending on the current fading state. 
The minimum rate capacity region (with zero outage probability) lies between that of the zero-outage capacity 
region and the ergodic capacity region: for minimum rates of zero it equals the ergodic capacity region, and for 
minimum rates on the boundary of the zero-outage capacity region, it cannot exceed these boundary points. This 
is illustrated in Figure 14.12, where we plot the ergodic, zero-outage, and minimum rate capacity region for a BC 
with Rician fading. We see from this figure that the ergodic capacity region is the largest, since it can adapt to 
the different channel states to maximize its average rate, averaged over all fading states. The zero-outage capacity 
region is the smallest, since it is forced to maintain a fixed rate in all states, which consumes much power when 
the fading is severe. The minimum rate capacity region lies between the other two, and depends on the minimum 
rate requirements. As the minimum rate vector that must be maintained in all fading states increases, the minimum 
rate capacity region approaches the zero-outage capacity region, and as this minimum rate vector decreases, the 
minimum rate capacity region approaches the ergodic capacity region. 




Figure 14.12: Ergodic, Zero-Outage, and Minimum Rate BC Capacity Regions (Rician fading with a K factor of 
1, Average SNR = 10 dB) 



14.5.5 Capacity with Multiple Antennas 

We now investigate the capacity region for a BC with multiple antennas. We have seen in Chapter 10.3 that MIMO 
systems can provide large capacity increases for single-user systems. The same will be true of multiuser systems: 
in fact multiple users can exploit multiple spatial dimensions even more effectively than a single user. 
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Consider a A'- user BC where the transmitter has Mf antennas and each receiver has M r antennas. The 
M r x Mt channel matrix H/ characterizes the channel gains between each antenna at the transmitter and each 
antenna at the A th receiver. The received signal for the A th user is then 



y k = H fe x + n k , 



(14.35) 



where x is the input to the transmit antennas, and we denote its covariance matrix as £ x . For simplicity, we 
normalize the bandwidth to unity 8 , B = 1 Hz, and assume the noise vector ii/,. is a circularly symmetric complex 
Gaussian with ~ 1V(0, 1 ). 

When the transmitter has more than one antenna, M t > 1, the BC is no longer degraded. In other words, 
receivers cannot generally be ranked by their channel quality since receivers have different channel gains associated 
with the different antennas at the transmitter. The capacity region of the general non-degraded broadcast channels 
is unknown. However, an achievable region for this channel was proposed in [54, 55] which was later shown 
to equal the capacity region [58]. The region is based on the notion of dirty paper coding (DPC) [59]. The 
basic premise of DPC is as follows. If the transmitter (but not the receiver) has perfect, non-causal knowledge 
of interference to a given user, then the capacity of the channel is the same as if there was no interference or, 
equivalently, as if the receiver had knowledge of the interference and could subtract it out. DPC is a technique that 
allows non-causally known interference to be “pre-subtracted” at the transmitter but in such a way that the transmit 
power is not increased. A more practical (and more general) technique to perform this pre-subtraction is described 
in [60]. 

In the MIMO BC, DPC can be applied at the transmitter when choosing codewords for different users. The 
transmitter first picks a codeword for User 1. The transmitter then chooses a codeword for User 2 with full (non- 
causal) knowledge of the codeword intended for User 1. Therefore the codeword of User 1 can be pre-subtracted 
such that User 2 does not see the codeword intended for User 1 as interference. Similarly, the codeword for User 
3 is chosen such that User 3 does not see the signals intended for Users 1 and 2 as interference. This process 
continues for all K users. The ordering of the users clearly matters in such a procedure, and needs to be optimized 
in the capacity calculation. Let ir(-) denote a permutation of the user indices and S = [Si, . . . , S^] denote a set 
of positive semi-definite covariance matrices with Tr(Si + . . . S^-) < P. Under DPC, if User 7r(l) is encoded 
first, followed by User 7r(2), etc., then the following rate vector is achievable: 



R(7t, S) . 



log 



I + (k){^2j>k ^'7r(j))H 7r ( fc ) 

I + H n (k)(Ylj>k ^7r(j))H^ fc ) 



k = l,...,K. 



(14.36) 



The capacity region C is then the convex hull of the union of all such rates vectors over all permutations and all 
positive semi-definite covariance matrices satisfying the average power constraint: 



Cbc(P, H) = Co 



UR(tt,S) 



t 7T,E 



(14.37) 



where R(7r, S) is given by (14.36). The transmitted signal is x = xj + ... + xk and the input covariance 
matrices arc of the form S/. = E x^x^*]. The DPC implies that xi, . . . , x j< arc uncorrelated, and thus X x = 
Yj i + . . . + X x 5; P • 

One important feature to notice about the rate equations defined by (14.36) is that these equations are neither 
a concave nor convex function of the covariance matrices. This makes finding the capacity region very difficult, 
because generally the entire space of covariance matrices that meet the power constraint must be searched over 

8 Capacity of unity bandwidth MIMO channels has a factor of .5 preceeding the log function for real (one-dimensional) channels with 
no such factor for complex (two-dimensional) channels [47. Chapter 3.1]. 
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[54, 55]. However, as described in Section 14.7, there is a duality between the MIMO BC and the MIMO MAC 
that can be exploited to greatly simplify this calculation. The capacity region for a 2-user channel with M = 2 and 
N = 1 computed by exploiting this duality is shown in Fig. 14.13. The region is defined by the outer boundary, 
and the lines inside this boundary each correspond to the capacity region of a different dual MIMO MAC channel 
whose sum power equals the power of the MIMO BC. The union of these dual regions yields the boundary of the 
MIMO BC region, as will be discussed in Section 14.7. 




Figure 14.13: MIMO BC capacity region, Hi = [1 0.5], H 2 = [0.5 1], P = 10 



14.6 Uplink (Multiple Access) Channel Capacity 

14.6.1 Capacity in AWGN 

The MAC consists of K transmitters, each with power 77, sending to a receiver over a channel with power gain g^. 
We assume all transmitters and the receiver have a single antenna. The received signal is corrupted by AW GN with 
PSD Nq/2. The two-user multiaccess capacity region is the closed convex hull of all vectors (R \. II 2 ) satisfying 
the following constraints [34] : 

R k <Blog 2 (l + ^\,k = l,2 (14.38) 

and 

Ri + R 2 < B log 2 (l + glP ^f P2 ) • (14.39) 

The first constraint (14.38) is just the capacity associated with each individual channel. The second constraint 
(14.39) indicates that the sum of rates for all users cannot exceed the capacity of a “superuser” with received 
power equal to the sum of received powers from all users. For K users, the region becomes 

Cm ac = | (At, ...,Rk): g> < B\og 2 (l + ) ,VS C {1,2, . . . ,iT}| . (14.40) 
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Thus, the region (14.40) indicates that the sum of rates for any subset of the K users cannot exceed the capacity of 
a superuser with received power equal to the sum of received powers associated with this user subset. 

The sum-rate capacity of a MAC is the maximum sum of rates R k where the maximum is taken over 

all rate vectors ( R \ , . . . , Rk) in the MAC capacity region. As with the sum-rate capacity of the BC, the MAC 
sum-rate also measures the maximum throughput of the system regardless of fairness, and is easier to characterize 
than the IT-dimensional capacity region. It can be shown from (14.40) that sum-rate capacity is achieved on the 
AWGN MAC by having all users transmit at their maximum power, which yields: 

Cm ac SR = B log 2 ( 1 + . (14.41) 



The intuition behind this result is that each user in the MAC has an individual power constraint, so not allowing 
a user to transmit at full power wastes system power. By contrast, the AWGN BC sum-rate capacity (14.20) is 
achieved by only transmitting to the user with the best channel. However, since all users share the power resource, 
no power is wasted in this case. 

The MAC capacity region for two users is shown in Figure 14.14, where Cj- and Cjj* arc given by 



and 
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(14.43) 

(14.44) 




Figure 14.14: Two-User MAC Capacity Region. 

The point (C i , 0) is the achievable rate vector when transmitter 1 is sending at its maximum rate and transmit- 
ter 2 is silent, and the opposite scenario achieves the rate vector (0, C 2 ). The corner points (Cj . C| ) and (Cj*, C 2 ) 
are achieved using the successive interference cancellation described above for superposition codes. Specifically, 
let the first user operate at the maximum data rate Ci. Then its signal will appeal - as noise to user 2; thus, user 2 
can send data at rate C 2 which can be decoded at the receiver with arbitrarily small error probability. If the receiver 
then subtracts out user 2’s message from its received signal, the remaining message component is just users l’s 
message corrupted by noise, so rate Ci can be achieved with arbitrarily small error probability. Hence, (C\. CJ) 
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is an achievable rate vector. A similar argument with the user roles reversed yields the rate point ( C * , C 2 ). Time- 
sharing between these two strategies yields any point on the straight line connecting (Ci, (7|) and (C\ , C 2 ) • Note 
that in the broadcast channel the better user must always be decoded last, whereas in the MAC decoding can be 
done in either order. This is a fundamental difference of the two channels. 

TD between the two transmitters operating at their maximum rates, given by (14.42), yields any rate vector 
on the straight line connecting C\ and 62 - With FD, the rates depend on the fraction of the total bandwidth that is 
allocated to each transmitter. Letting B\ and Il 2 denote the bandwidth allocated to each of the two users, we get 
the achievable rate region 



Cfd = 



U 



{B\,B2'.B\+B2=B} 



Ri = Bi log 2 



1 + lt )’* 2 = B2log2 



i+ 

N 0 B 2 ) 



(14.45) 



which is plotted in Figure 14.14. Clearly this region dominates TD, since setting B\ = tB and B 2 = (1 — r)B 
in (14.45) has R\ > tC\ and R 2 > (1 — t)C 2 . It can be shown [34] that this curve touches the capacity region 
boundary at one point, and this point corresponds to the rate vector that maximizes the sum-rate R \ + P 2 ■ To 
achieve this point, the bandwidths B\ and B 2 must be proportional to their corresponding received powers g\ P\ 
and g 2 P 2 - 

As with the BC, we can obtain the same achievable rate region with TD as with FD by efficient use of the 
transmit power. If we take the constraints Pi and P 2 to be average power constraints, then since user k only uses 
the channel a fraction r/, : of the time, its average power over that time fraction can be increased to P/./r/. ; . The rate 
region achievable through variable-power TD is then given by 
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(14.46) 



and substituting B k = 77 , B in (14.46) yields the same rate region as in (14.45). 

Superposition codes without successive decoding can also be used. With this approach, each transmitter’s 
message acts as noise to the others. Thus, the maximum achievable rate in this case cannot exceed (C'*,C'|), 
which is clearly dominated by FD and TD for some bandwidth or time allocations, in particular the allocation that 
intersects the rate region boundary. 



Example 14.8: Consider a MAC channel in AWGN with transmit power Pi = P 2 = 100 mW for both users, and 
channel gains g\ = .08 for user 1 and g 2 = -001 for user 2. Assume the receiver noise has Nq = 10 ~ 9 W/Hz and 
the system bandwidth is B = 100 KHz. Find the corner points of the MAC capacity region. Also find the rate that 
user 1 can achieve if user 2 requires a rate of R 2 = 100 Kbps and of R 2 = 50 Kbps. 

Solution: From (14.42)-(14.44) we have C± = B log 2 ^1 + = 6.34 x 10 5 , C 2 = Plog 2 ^1 + 7 ^) = 

1 x 10 5 , 

c; = BloE2 ( 1 + wfe) = 5 ' 36i<105 ’ 

and 

= B log 2 f 1 + g — ) = 1.77 x 10 3 . 

2 &2 V NoB + giPj 

The maximum rate for user 2 is 100 Kbps, so if he requires P 2 = 100 Kbps, this rate point is associated with the 
corner point (CJ, C 2 ) of the capacity region, so user 1 can achieve a rate of R\ = C{ = 536 Kbps. If user 2 requires 
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only i ?2 = 50 Kbps then the rate point lies on the TD portion of the capacity region. In particular, timesharing as 
r(Ci, C|) + (1 — t)((7*, C 2 ), the timeshare value that yields P 2 = 50 Kbps satisfies t(7| + (1 — t)C 2 = R 2 . 
Solving for r yields r = (R 2 — C 2 ) /{C% — C 2 ) = .51, about halfway between the two corner points. Then user 
1 can get rate R\ = tC\ + (1 — t)C* = 5.86 x 10 5 . This example illustrates the dramatic impact of the near-far 
effect in MAC channels. Even though both users have the same transmit power, the channel gain of user 2 is much 
less than the gain of user 1. Hence, user 2 can achieve at most a rate of 100 Kbps, whereas user 1 can achieve a rate 
between 536 and 634 Kbps. Moreover, the interference from user 2 does not have that much of an impact on user 
1 due to the weak channel gain associated with the interference: user 1 sees data rates of C\ = 634 Kbps without 
interference and C\ = 536 Kbps with interference. However, the interference from user 1 severely limits the data 
rate of user 2, decreasing it almost two orders of magnitude from 6 ' 2 = 100 Kbps to = 1.77 Kbps. 



14.6.2 Capacity in Fading 

We now consider the capacity region of a MAC with AWGN and fading, where the channel gains for each user 
change over time. We assume all transmitters and the receiver have a single antenna and that the receiver has 
AWGN with PSD Nq/2. Each user has an individual power constraint P&, k = 1 , ,K. The time-varying power 
gain of user k’s channel at time i is <jk{i] and is independent of the fading of other users. We define the fading 
state at time i as g[i] = (pi [*],... , g k [i] ) , with the time reference dropped when the context is clear. We assume 
perfect CSI about the fading state at both the transmitter and receiver; the case of receiver CSI only is treated in 
[45, Chapter 6.3], Like the BC and single-user channels, the fading MAC also has two notions of capacity: the 
ergodic capacity region that characterizes the achievable rate vectors averaged over all fading states, and the outage 
capacity region that characterizes the maximum rate vector that can be maintained in all states with possibly some 
nonzero probability of outage. 

We first consider the ergodic capacity region, as derived in [40], Define a power policy V as a function that 
maps a fading state g = (g 1 , . . . , (]k) to a set of powers I J \ ( g j , . . . . Pk( g), one for each user. Let J- mac denote 
the set of all power policies satisfying the average per-user power constraint P /. : 

Fmac = {V : E g [.Pfc(g)] < Pk, k = 1, . . . , K] . 

The MAC capacity region assuming a constant fading state g with power allocation Pi (g), . . . , Pr( g) is given by 



C M Ac(Pi(g ), . • -,Pk( g)) = j(i7i, ...,Rk): J2 R k < S1 °g 2 (l + Efce ^f (g) ) ’ V5 c i 1 ’ 2 ’ • • • > j 

k&S (14.47) 

The set of achievable rates averaged over all fading states under power policy V is given by 

Ekes9k p k( g)\ 



Cmac(P ) = S (Ri, ■ ■ ■ 1 Rk ) : YRk < Eg 

l fceS 



R 1°§2 1 + 



N 0 B J 



, V<S C {1, 2, . . . , K} 



(14.48) 

The ergodic capacity region is then the union over all power policies that satisfy the individual user power con- 
straints: 

Cmac{Pi, ■■■, Pk) = [J Cmac{P )• (14.49) 

V^Tmac 
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From (14.41), (14.48), and (14.49), the sum-rate capacity of the MAC in fading reduces to 



Cmacsr = m ax E g 
V&Tmac 



B log 2 1 + 



J2k=iSkPk{g) \ 



N 0 B 






(14.50) 



The maximization in (14.50) is solved using Lagrangian techniques, and the solution reveals that the optimal 
transmission strategy to achieve sum-rate is to only allow one user to transmit in every fading state [62]. Under 
this optimal policy the user that transmits in a given fading state g is the one with the largest weighted channel 
gain gfc/Afc, where A/, is the Lagrange multiplier associated with the average power constraint of the A th user. This 
Lagrangian is a function of the user’s average power constraint and fading distribution. By symmetry, if all the 
users have the same fading distribution and the same average power constraint, then the A/,.s arc the same for all 
users, and the optimal policy is to allow only the user with the best channel g^ to transmit in fading state g. Once 
it is determine which user should transmit in a given state, the power the user allocates to that state is determine 
via a water-filling over time. The intuition behind only allowing one user at a time to trans mi t is as follows. Since 
users can adapt their powers over time, system resources are best utilized by assigning them to the user with the 
best channel and allowing that user to transmit at a power commensurate with his channel quality. When users 
have unequal average received power this strategy is no longer optimal, since users with weak average received 
SNR would rarely transmit, so their individual power resources would not be utilized as effectively as they could 
be. 

The MAC zero-outage capacity region, derived in [42], defines the set of rates that can be simultaneously 
achieved for all users in all fading states while meeting the average power constraints of each user. From (14.40), 
given a power policy V that maps fading states to user powers, the MAC capacity region in state g is 

Cmac(V) = jcRi, ... , R k ) : g R k < B\og 2 (l + ^ fc£ ^ fc(S) ) , VS C {1, 2, . . . , iT}| . (14.51) 

Then under policy V the set of rates that can be maintained in all fading states g is 

C° MAC (V) = f) c MAc(V). (14.52) 

g 

The zero-outage capacity region is then the union of C^ AC {V) over all power policies V that satisfy the user power 
constraints of C\ IAC (V). Thus, the zero-outage MAC capacity region is given by 

C° mac (P u ...,P k )= U PI Cmac(V). (14.53) 

V&Tmac S 

The outage capacity region of the MAC is similar to the zero-outage capacity region, except that users can 
suspend transmission in some outage states subject to a given nonzero probability of outage. As with the BC, the 
MAC outage capacity region is more difficult to obtain than the zero-outage capacity region, since in any given 
fading state the transmission strategy must determine which users to put into outage, the decoding order of the 
nonoutage users, and the power at which these nonoutage users should transmit. The MAC outage capacity region 
is obtained implicitly in [44] by determining whether a given rate vector R can be maintained in all fading states, 
subject to a given per-user outage probability, without violating the per-user power constraints. Ergodic and outage 
capacities can also be combined to obtain the minimum rate capacity region for the MAC. As with the BC, this 
region characterizes the set of all average rate vectors that can be maintained, averaged over all fading states, 
subject to some minimum rate vector that must be maintained in all states with some outage probability (possibly 
zero). The minimum rate capacity region for the fading MAC is derived in [48] using the duality principle that 
relates capacity regions of the BC and the MAC. This duality principle is described in the next section. 
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14.6.3 Capacity with Multiple Antennas 

We now consider MAC channels with multiple antennas. We will model the channel based on symmetry between 
the MIMO BC on the downlink and the corresponding MIMO MAC on the uplink. As in the MIMO BC model, 
we normalize bandwidth to unity, B = 1 Hz, and assume the noise vector n at the MAC receiver is a circularly 
symmetric complex Gaussian with n ~ N(0, 1). Since the channel gains on an uplink and downlink are generally 
symmetric, if the channel matrix of user k on the MIMO BC is given by H/ , then the channel gains on the 
MIMO MAC corresponding to the uplink of the BC are given by H^. Define = [H^ . . . H^-] . Then the 
capacity region of the Gaussian MIMO MAC where user k has channel gain matrix and power P/,. is given by 
[51,52,53] 



Cmac((Pi,..-,Pk);H h ) 



u 

{Qfc>0, Tr(Q fc )<Pfc 




(Pi, . . . , Rk ) : 1 

^2 k eS R k < 1 °g| I +E fc6 s H fc r Q* H *l vsc{i,...,A'} J 

(14.54) 



This region is achieved as follows. The kth user transmits a zero-mean Gaussian with spatial covariance matrix Q /.. 
Each set of covariance matrices (Qi, . . . , Qk) corresponds to a A- dimensional polyhedron (i.e. {(Pi, . . . , Rk) ■ 
E k&S R k < ilog|I + E fce sH^QA:H fc | V.S' C {1, ..., A'}}), and the capacity region is equal to the union(over 
all covariance matrices satisfying the power constraints) of all such polyhedrons. The corner points of this pentagon 
can be achieved by successive decoding, in which users’ signals are successively decoded and subtracted out of the 
received signal. Note that the capacity region (14.54) has several si mi larities with its single-antenna counterpart: 
it is defined based on the rate sum associated with subsets of users, and the corner points of the region are obtained 
using successive decoding. 

For the two-user case, each set of covariance matrices corresponds to a pentagon, similar in form to the 
capacity region of the single-antenna MAC. For example, the corner point where Pi = log |I + H^QiHij and 
P 2 = log |I + H^QiHi + Q 2 H 2 1 — Pi = log |I + (I + HfQiHO^H^QsHal corresponds to decoding 
User 2 first (i.e. in the presence of interference from User 1) and decoding User 1 last (without interference from 
User 2). 



14.7 Uplink/Downlink Duality 

The downlink and uplink channels shown in Figure 14.1 appeal - quite similar: the downlink is almost the same 
as the uplink with the direction of the arrows reversed. There are three fundamental differences between the two 
channel models. First, in the downlink there is an additive noise term associated with each receiver, whereas in the 
uplink there is only one additive noise term since there is only one receiver. Another fundamental difference is that 
the downlink has a single power constraint associated with the transmitter, whereas the uplink has different power 
constraints associated with each user. Finally, on the downlink both the signal and interference associated with 
each user travel through the same channel, whereas on the uplink these signals travel through different channels, 
giving rise to the near-far effect. Despite extensive study of uplink and downlink channels individually, there has 
been little effort to draw connections between the two models or exploit these connections in analysis and design. 
In this section we will describe a duality relationship between these two channels, and show how this relationship 
can be used in capacity analysis and in the design of uplink and downlink transmission strategies. 

We say that A'-user downlink and uplink, as shown in Figure 14.1 for K = 3, are duals of each other under 
the following three conditions: 

• The channel impulse responses h k (t),k = 1, . . . , K in the downlink are the same as in the uplink for all k. 
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• Each receiver in the downlink has the same noise statistics and these statistics are the same as those of the 
receiver noise in the uplink. 

• The power constraint P on the downlink equals the sum of individual power constraints Pk,k = 1, .... A' 
on the uplink. 

Despite the similarities between the downlink (BC) and uplink (MAC), their capacity regions are quite different. 

In particular, the two-user AWGN BC capacity region shown by the largest region in Figure 14.10 is markedly dif- 
ferent from the two-user AWGN MAC capacity region shown in Figure 14. 14. The capacity regions of dual MACs 
and BCs arc also very different in fading under any of the fading channel capacity definitions: ergodic, outage, or 
minimum-rate capacity. However, despite their different shapes, the capacity regions of the dual channels arc both 
achieved using a superposition coding strategy, and the optimal decoders for the dual channels exploit successive 
decoding and interference cancellation. 

The duality relationship between the two channels is based on exploiting their si mi lar encoding and decoding 
strategies while bridging their differences by summing the individual MAC power constraints to obtain the BC 
power constraint and scaling the BC gains to achieve the near-far effect of the MAC. This relationship was devel- 
oped in [48], where it was used to show that the capacity region and optimal transmission strategy of either the 
BC or the MAC can be obtained from the capacity region and optimal transmission strategy of the dual channel. 

In particular, it was shown in [48] that the capacity region of the AWGN BC with power P and channel gains 
g = (fj] , . . . , ()k) is equal to the capacity region of the dual AWGN MAC with the same channel gains, but where 
the MAC is subject to a sum power constraint Pf. < P instead of individual power constraints (Pi, . . . , P/, ) . 

The sum power constraint in the MAC implies that the MAC transmitters draw power from a single pooled power 
source with total power P, and that power is allocated between the MAC transmitters such that Pk < P- 

Mathematically, the BC capacity region can be expressed as the union of capacity regions for its dual MAC with a 
pooled power constraint as [48] 

Cbc{P, g) = U Cmac(Pi,---,Pk;s)- (14.55) 

m,..,p K y.j:f =1 Pk=p} 

where Cbc(P, g) is the AWGN BC capacity region with total power constraint P and channel gains g = (g i, . . . , gx), 
as given by (14.18) with nk = Nq/ g^, and Cmac{P\ > • • • , Pk I g) is the AWGN MAC capacity region with indi- 
vidual power constraints Pi, ... , Pk and channel gains g = (gi, . . . , gx), as given by (14.40). This relationship 
is illustrated for two users in Figure 14.15 where we see the BC capacity region formed from the union of MAC 
capacity regions with different power allocations between MAC transmitters that sum to the total power P of the 
dual BC. 

In addition to the capacity region relationship of (14.55), it is also shown in [48] that the optimal power 
allocation for the BC associated with any point on the boundary of its capacity region can be obtained from the 
allocation of the sum-power on the dual MAC that intersects with that point. Moreover, the decoding order of 
the BC for that intersection point is the reverse decoding order of this dual MAC. Thus, the optimal encoding 
and decoding strategy for the BC can be obtained from the optimal strategies associated with its dual MAC. This 
connection between optimal uplink and downlink strategies may have interesting implications for practical designs. 

Duality also implies that the MAC capacity region can be obtained from that of its dual BC. This relationship 
is based on the notion of channel scaling. It is easily seen from (14.40) that the AWGN MAC capacity region is 
not affected if the kth user’s channel gain gi,- is scaled by power gain a as long as its power P/, is also scaled by 
1/a. However, the dual BC is fundamentally changed by channel scaling since the encoding and decoding order 
of superposition coding on the BC is determined by the order of the channel gains. Thus, the capacity region of 
the BC with different channel scalings will be different, and it is shown in [48] that the MAC capacity region can 
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Figure 14. 15: AWGN Downlink (BC) Capacity Region as a Union of Capacity Regions for the Dual Uplink (MAC) 



be obtained by taking an intersection of the BC with all possible channel scalings a\~ on the kth user’s channel. 
Mathematically, we obtain the MAC capacity region from the dual BC as 

Cmac(Pi, ■ • • , Pk\ g) = P| Cbc I Pk/ Qik', (ohffi, — > ukQk ) J • (14.56) 

{a\,...,aK )>0 Vfe=l / 



This relationship is illustrated for two users with channel gain g = (<?i, < 72 ) in Figure 14.16. This figure shows that 
the MAC capacity region is formed from the intersection of BC capacity regions with different channel scalings 
a applied to the first user. 9 . As a — > 0, the channel gain ag± of the 1st user goes to zero but the total power 
P = P\/a + P 2 goes to infinity. Since user 2’s channel gain doesn’t change, he takes advantage of the increased 
power and his rate grows asymptotically large with a. The opposite happens as a — > 00 , user l’s channel gain 
grows and the total power P = P\ / a- + P> > P 2 , so user 1 takes advantage of his increasing channel gain to get 
asymptotically large rate with any portion of the total power P. All scalings between zero and infinity sketch out 
different BC capacity regions that intersect to form the MAC region. In particular, when a = gifg\, the channel 
gains of both users in the scaled BC channel arc the same, and this yields the time-sharing segment of the MAC 
capacity region. The optimal decoding order of the MAC for a given point on its capacity region can also be 
obtained from the channel scaling associated with the dual scaled BC whose capacity region intersects the MAC 
capacity region at that point. 

These duality relationships arc extended in [48] to many other important channel models. In particular, duality 
applies to fading MACs and BCs, so that the ergodic, outage, and minimum-rate capacity regions, along with the 
optimal encoding and decoding strategies, for one channel can be obtained from the regions and strategies for the 
dual channel. MAC and BC duality also holds for parallel and frequency-selective fading channels, which defines 
the connection between the capacity regions of MACs and BCs with ISI [49, 50]. Another important application of 
duality is to multiple antenna (MIMO) MACs and BCs. In [56] the notion of duality between the BC and MAC was 
extended to MIMO systems such that the MIMO BC capacity region with power constraint P was shown to equal 
to the union of capacity regions of the dual MAC, where the union is taken over all individual power constraints 

9 It is sufficient to take the intersection for scaling over just K — 1 users because scaling by (a 1 , ... , uk-i, cxk) is equivalent to scaling 
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Figure 14.16: AWGN Uplink (MAC) Capacity Region as an Intersection of Capacity Regions for the Scaled Dual 
Downlink (BC) 



that sum to P. Mathematically 

Cbc(P , H) = IJ C M ac((Pu ■ • • , Pk)\ H h ). 

(Pl,-,P K ):T,k=iPk=P 

This duality relationship is illustrated in Figure 14. 13, where the MIMO BC capacity region is defined by the outer 
boundary in the figure. The regions inside this boundary arc the MIMO MAC capacity region under different 
individual user power constraints that sum to the total BC power P. Recall that the MIMO BC capacity region is 
extremely difficult to compute direction, since it is not concave or convex over the covariance matrices that must 
be optimized. However, the optimal MIMO MAC is obtained via a standard convex optimization that is easy to 
solve [61]. Moreover, duality not only relates the two capacity regions, but can also be used to obtain the optimal 
transmission strategy on the MIMO BC capacity region from a duality transformation of the optimal MIMO MAC 
strategy that achieves the same point. Thus, for MIMO channels, duality can not only be exploited to greatly 
simplify the calculations in finding the capacity region, but it also greatly simplifies finding the corresponding 
optimal transmission strategy. 

14.8 Multiuser Diversity 

Multiuser diversity takes advantage of the fact that in a system with many users whose channels fade independently, 
at any given time some users will have better channels than others. By transmitting only to users with the best 
channels at any given time, system resources arc allocated to the users that can best exploit them, which leads 
to improved system capacity and/or performance. Multiuser diversity was first explored in [62] as a means to 
increase throughput and reduce error probability in uplink channels, and the same ideas can be applied to downlink 
channels. The multiuser diversity concept is an extension of the single-user diversity concepts described in Chapter 
7. In single-user diversity systems a point-to-point link consists of multiple independent channels whose signals 
can be combined to improve performance. In multiuser diversity the multiple channels arc associated with different 
users, and the system typically uses selection-diversity to select the user with the best channel in any given fading 
state. The multiuser diversity gain relies on disparate channels between users, so the larger the dynamic range 
of the fading, the higher the multiuser diversity gain. In addition, as with any diversity technique, performance 
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improves with the number of independent channels. Thus, multiuser diversity is most effective in systems with a 
large number of users. 

From Section 14.5, we have seen that the total throughput (sunr-rate capacity) of the fading downlink is 
maximized by allocating the full system bandwidth to the user with the best channel in each fading state. As 
described in Section 14.6, a similar result holds for the fading uplink if all users have the same fading distribution 
and average power. If the users have different fading statistics or average powers, then the channel in any given state 
is allocated to the user with the best weighted channel gain, where the weight depends on the user’s channel gain 
in the given state, his fading statistics, and his average power constraint. The notion of scheduling transmissions 
to users based on their channel conditions is called opportunistic scheduling, and numerical results in [62, 41] 
show that opportunistic scheduling coupled with power control can significantly increase both uplink and downlink 
throughput as measured by sum-rate capacity. 

Opportunistic scheduling can also improve BER performance [62]. Let 7 k [i],k = 1 , ,K denote the SNR 
for each user’s channel at time i. By transmitting only to the user with the largest SNR, the system SNR at time i 
is y[i] = max;. 7 /,. [i] . It is shown in [63] that in i.i.d. Rayleigh fading this maximum SNR is roughly In K larger 
than the SNR of any one user as K grows asymptotically large, leading to a multiuser diversity gain in SNR of 
In K. Moreover, if P s ( 7 ) denotes the probability of symbol error for the user with the best channel gain at time 
i, then P s ( 7 ) will exhibit the same diversity gains as selection-combining in a single-user system (described in 
Chapter 7.2.2) as compared to the probability of error associated with any one user. As the number of users in 
the system increases, the probability of error approaches that of an AWGN channel without fading, analogous to 
increasing the number of branches in single-user selection-combining diversity. 

Scheduling transmission to users with the best channel raises two problems in wireless systems: fairness and 
delay. If user fade levels change very slowly, then one user will occupy the system for a long period of time. The 
time between channel uses for any one user could be quite long, and such latency might be unacceptable for a 
given application. In addition, users with poor average SNRs will rarely have the best channel and therefore rarely 
get to transmit, which leads to unfairness in the allocation of the system resources. A solution to the fairness and 
delay problems in the downlink called proportional fair scheduling was proposed in [63]. Suppose at time i each 
of the K users in the downlink system can support rate Rk[i\ if allocated the full power and system bandwidth. 
Let Tk[i\ denote that the average throughput of the /rth user at time i, averaged over a time window [i — i c ,i\, 
where the window size i c is a parameter of the scheduler design. In the zth time slot, the scheduler then transmits 
to the user with the largest ratio R /- [i] /T k [i] . With this scheduler, if at time i all users have experienced the same 
average throughput T k [i\ = T[i\ over the prior time window then the scheduler transmits to the user with the best 
channel. Suppose, however, that one user, user j, has experienced poor throughput over the prior time window so 
that Tj [z] < < T k [i] , j / k. Then at time i user j will likely have a high ratio of Rj [i] /T) [*] and thus will be favored 
in the allocation of resources at time i. Assuming that at time i the user k* has the highest ratio of R k [i]/T k [i], the 
throughput on the next timeslot is updated as 



Tk(i + 1 ) — 



(1 - 1 ) Tk(i) + l c R k (i) k = k* 
(l - h) T k (i ) k + k* 



(14.57) 



With this scheduling scheme, users with the best channels arc still allocated the channel resources when through- 
put between users is reasonably fair. However, if the throughput of any one user is poor, that user will be favored 
for resource allocation until his throughput becomes reasonably balanced with that of the other users. Clearly 
this scheme will have a lower throughput than allocating all resources to the user with the best channel, which 
maximizes throughput, and the throughput penalty will increase as the users have more disparate average channel 
qualities. The latency with this scheduling scheme is controlled via the time window i c . As the window size 
increases the latency also increases, but system throughput increases as well since the scheduler has more flexi- 
bility in allocating resources to users. As the window size grows to the entire transmission time, the proportional 
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fair scheduler just reduces to allocating system resources to the user with the best channel. The proportional fair 
scheduling algorithm is paid of the standard for packet data transmission in CDMA2000 cellular systems [64] and 
its performance for that system is evaluated in [65]. Alternative methods for incorporating fairness and delay con- 
straints in opportunistic scheduling have been evaluated in [66, 67], along with their performance under practical 
constraints such as imperfect channel estimates. 

14.9 MIMO Multiuser Systems 

Multiuser systems with multiple antennas at the transmitter(s) and/or receiver(s) arc called MIMO multiuser sys- 
tems. These multiple antennas can significantly enhance performance in multiple ways. The antennas can be used 
to provide diversity gain to improve BER performance. The capacity region of the multiuser channel is increased 
by MIMO, providing multiplexing gain. Finally, multiple antennas can provide directivity gain to spatially sepa- 
rate users, which reduces interference. There is typically a tradeoff between these three types of gains in MIMO 
multiuser systems [68]. 

The multiplexing gain of a MIMO multiuser system characterizes the increase in the uplink or downlink 
capacity region associated with adding multiple antennas. The capacity regions of MIMO multiuser channels have 
been extensively studied, motivated by the large capacity gains associated with single-user systems. For AW GN 
channels the MIMO capacity region is known for both the uplink [5 1] and the downlink [58]. These results can be 
extended to find the MIMO capacity region in fading with perfect CSI at all transmitters and receivers. Capacity 
results and open problems related to MIMO multiuser fading channels under other assumptions about channel CSI 
arc described in [69]. 

Beamforming was discussed in Chapter 10.4 as a technique to achieve full diversity in single-user systems at 
the expense of some capacity loss. In multiuser systems, beamforming has less of a capacity penalty due to the 
multiuser diversity effect, and in fact beamforming can achieve the sum-rate capacity of the MIMO downlink in 
the asymptotic limit of a large number of users [71, 72]. 

Multiuser diversity is based on the idea that in multiuser channels the channel quality varies across users, so 
performance can be improved by allocating system resources at any given time to the users with the best chan- 
nels. Design techniques to exploit multiuser diversity were discussed in Section 14.8 for single-antenna multiuser 
systems. In MIMO multiuser systems the benefits of multiuser diversity arc two-fold. First, MIMO multiuser 
diversity provides improved channel quality since only users with the best channels arc allocated system resources. 
In addition, MIMO multiuser diversity provides abundant directions where users have good channel gains, so that 
the users chosen for resource allocation in a given state not only have very good channel quality, but they also have 
good spatial separation, thereby limiting interference between them. This two-fold diversity benefit allows rela- 
tively simple suboptimal transmitter and receiver techniques to have near-optimal performance as the number of 
users increases [73, 71]. It also eliminates the requirement for multiple receive antennas in downlinks and multiple 
transmit antennas in uplinks to obtain large capacity gains, which simplifies mobile terminal design. In particular, 
the sum-rate capacity gain in MIMO BCs increases roughly linearly with the number of users and transmit anten- 
nas, independent of the number of receive antennas at each user and similarly, the sum-rate capacity gain in MIMO 
MACs increases roughly linearly with the number of users and receive antennas, independent of the number of 
transmit antennas at each user [75]. Note that multiuser diversity increases with the dynamic range and rate of the 
channel fading. By modulating in a controlled fashion the amplitude and phase of multiple transmit antennas, the 
fading rate and dynamic range can be increased, leading to higher multiuser diversity gains. This technique, called 
opportunistic beamforming, is investigated in [63]. 

Space-time modulation and coding techniques for MIMO multiuser systems have also been developed [76, 
77, 70]. The goal of these techniques is to achieve the full range of diversity, multiplexing, and directivity tradeoffs 
inherent to MIMO multiuser systems. Multiuser detection techniques can also be extended to MIMO channels and 
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provide substantial performance gains [79, 78, 80], In wideband channels the multiuser MIMO techniques must 
also cope with frequency-selective fading [81, 82, 83], Advanced transmission techniques for these wideband 
channels promise even more significant performance gains than in narrowband channels, since frequency-selective 
fading provides yet another form of diversity. The challenge for MIMO multiuser systems is to develop signaling 
techniques of reasonable complexity that deliver on the promised performance gains even in practical operating 
environments. 
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Chapter 14 Problems 



1. Consider an FDMA system for multimedia data users. The modulation format requires 10 MHz of spectrum, 
and guard bands of 1 MHz are required on each side of the allocated spectrum to minimize out-of-band 
interference. What total bandwidth is required to support 100 simultaneous users in this system? 

2. GSM systems have 25 MHz of bandwidth allocated to their uplink and downlink, divided into 125 TDMA 
channels, with 8 user timeslots per channel. A GSM frame consists of the 8 timeslots, preceeded by a set 
of preamble bits and followed by a set of trail bits. Each timeslot consists of 3 start bits at the beginning, 
followed by a burst of 58 data bits, then 26 equalizer training bits, another burst of 58 data bits, 3 stop bits, 
and a guard time corresponding to 8.25 data bits. The transmission rate is 270.833 Kbps. 

(a) Sketch the structure of a GSM frame and a timeslot within the frame. 

(b) Find the fraction of data bits within a timeslot, and the information data rate for each user. 

(c) Find the duration of a frame and the latency between timeslots assigned to a given user in a frame, 
neglecting the duration of the preamble and trail bits. 

(d) What is the maximum delay spread in the channel such that the guard band and stop bits prevent overlap 
between timeslots. 

3. Consider a DS CDMA system occupying 10 MHz of spectrum. Assume an interference-limited system with 
a spreading gain of G = 100 and code cross correlation of 1 /G. 

(a) For the MAC, find a formula for the SIR of the received signal as a function of G and the number of 
users K. Assume that all users transmit at the same power and there is perfect power control, so all 
users have the same received power. 

(b) Based on your SIR formula in part (a), find the maximum number of users K that can be supported 
in the system, assuming BPSK modulation with a target BER of 10 -3 . In your BER calculation you 
can treat interference as AWGN. How does this compare with the maximum number of users K that 
an FDMA system with the same total bandwidth and information signal bandwidth could support? 

(c) Modify your SIR formula in part (a) to include the effect of voice activity, defined as the percentage of 
time that users are talking, so interference is multiplied by this percentage. Also find the voice activity 
factor such that the CDMA system accommodates the same number of users as an FDMA system. Is 
this a reasonable value for voice activity? 

4. Consider a FH CDMA system that uses FSK modulation and the same spreading and information bandwidth 
as the DS CDMA system in the previous problem. Thus, there arc G = 100 frequency slots in the system, 
each of bandwidth 100 KHz. The hopping codes arc random and uniformly distributed, so the probability 
that a given user occupies a given frequency slot on any hop is .01. As in the previous problem, noise is 
essentially negligible, so the probability of error on a particular' hop if only one user occupies that hop is 
zero. Also assume perfect power control, so the received power from all users is the same. 

(a) Find an expression for the probability of bit error when m users occupy the same frequency slot. 

(b) Assume there is a total of K users in the system at any time. What is the probability that on any hop 
m there is more than one user occupying the same frequency? 

(c) Find an expression for the average probability of bit error as a function of K, the total number of users 
in the system. 
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5. Compute the maximum throughput T for a pure ALOHA and a slotted ALOHA random access system, 
along with the load L that achieves the maximum in each case. 

6. Consider a pure ALOHA system with a transmission rate of R = 10 Mbps. Compute the load L and 
throughput T for the system assuming 1000 bit packets and a Poisson arrival rate of A = 10 3 packets/sec. 
Also compute the effective data rate ( rate of bits successfully received). What other value of load L results 
in the exact same throughput? 

7. Consider a 3-user uplink channel with channel power gains g \ = 1, fj 2 = 3, and c/3 = 5 from user k to the 
receiver, k = 1, 2, 3. Assume all three users require a 10 dB SINR. The receiver noise is n = 1. 

(a) Confirm that the vector equation (7 — F)P > u given by (14.6) is equivalent to the SINR constraints 
of each user. 

(b) Determine if a feasible power vector exists for this system such that all users meet the required SINR 
constraints and, if so, find the optimal power vector P* such that the desired SINRs arc achieved with 
minimum transmit power. 

8. Find the two-user broadcast channel capacity region under superposition coding for transmit power P = 10 
mW, B = 100 KHz, and N 0 = 10“ 9 . 

9. Show that the sum-rate capacity of the AWGN BC is achieved by sending all power to the user with the 
highest channel gain. 

10. Derive a formula for the optimal power allocation on a fading broadcast channel to maximize sum-rate. 

11. Find the sum-rate capacity of a two-user fading BC where the fading on each user’s channel is independent. 
Assume each user has a received power of 10 mW and an effective noise power of 1 mW with probability .5 
and 5 mW with probability .5. 

12. Find the sum-rate capacity for a two-user broadcast fading channel where each user experiences Rayleigh 
fading. Assume an average received power of P = 10 mW for each user and bandwidth, B = 100 KHz, 
and A)) = 10 -9 W/Hz. 

13. Consider the set of achievable rates for a broadcast fading channel under frequency-division. Given any rate 
vector in Cfd{P , B) for a given power policy {V and bandwidth allocation policy B , as defined in (14.30), 
find the timeslot and power allocation policy that achieves the same rate vector. 

14. Consider a time-varying broadcast channel with total bandwidth B = lOOKHz. The effective noise for user 
1 has pdf n\ = 10 -5 W/Hz with probability 3/4, and the value n i = 2 x 10“ 5 W/Hz with probability 
1/4. The effective noise for user 2 takes the value n 2 = 10" 5 W/Hz with probability 1/2, and the value 
ri2 = 2 x 10 -5 W/Hz with probability 1/2. These noise densities arc independent of each other over all time. 
The total trans mi t power is is P = 10 W. 

(a) What is the set of all possible joint noise densities and their corresponding probabilities? 

(b) Obtain the optimal power allocation between the two users and the corresponding time- varying capac- 
ity rate region using time-division. Assume user k is allocated a fixed timeslot t/, ; for all time where 
T\ ~\~ t i 2 — 1 and a fixed average power P over all time, but that each user may change its power within 
its own timeslot, subject to the average constraint P. Find a rate point that exceeds this region assuming 
you don’t divide power equally. 
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(c) Assume now fixed frequency division, where the bandwidth assigned to each user is fixed and is evenly 
divided between the two users: B\ = B 2 = B/2. Assume also that you allocate half the power to each 
user within his respective bandwidth (Pi = P 2 = P/2), and you can vary the power over time, subject 
only to the average power constraint P/2. What is the best rate point that can be achieved? Find a rate 
point that exceeds this region assuming that you don’t share power and/or bandwidth equally. 

(d) Is the rate point (Pi = 100, 000, P 2 = 100, 000) in the zero-outage capacity region of this channel? 

15. Show that the lx -user AWGN MAC capacity region is not affected if the Mi user’s channel power gain g /. 
is scaled by a if the Mi user’s transmit power P}- is also scaled by 1 /a. 

16. Consider a multiple access channel being shared by two users. The total system bandwidth is B = lOOKHz. 
Transmit power of user 1 is Pi = 3mW, while transmit power of user 2 is P 2 = lmW. The receiver noise 
density is .001/iW/Hz. You can neglect any path loss, fading, or shadowing effects. 

(a) Suppose user 1 requires a data rate of 300 Kbps to see videos. What is the maximum rate that can 
be assigned to user 2 under time-division? How about under superposition coding with successive 
interference cancellation? 

(b) Compute the rate pair (Pi , P 2 ) where the frequency-division rate region intersects the region achieved 
by code-division with successive interference cancellation ( G = 1). 

(c) Compute the rate pair (Pi, P 2 ) such that Pi = P 2 (i.e. where the two users get the same rate) for time 
division and for spread spectrum code division with and without successive interference cancellation 
for a spreading gain G = 10. Note: To obtain this region for G > 1 you must use the same reasoning 
on the MAC as was used to obtain the BC capacity region with G > 1. 

17. Show that the sum-rate capacity of the AWGN MAC is achieved by having all users trans mi t at full power. 

18. Derive the optimal power adaptation for a two-user fading MAC that achieves the sum-rate point. 

19. Find the sum-rate capacity of a two-user fading MAC where the fading on each user’s channel is independent. 
Assume each user has a received power of 10 mW and an effective noise power of 1 mW with probability .5 
and 5 mW with probability .5. 

20. Consider a 3-user fading downlink with bandwidth 100 KHz. Suppose that the three users all have the same 
fading statistics, so that their received SNR when they are allocated the full power and bandwidth are 5 dB 
with probability 1/3, 10 dB with probability 1/3, and 20 dB with probability 1/3. Assume a discrete time 
system with fading i.i.d. at each time slot. 

(a) Find the maximum throughput of this system if at each time instant the full power and bandwidth arc 
allocated to the user with the best channel. 

(b) Simulate the throughput obtained using the proportional fair scheduling algorithm for a window size 
of 1, 5, and 10. 
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Chapter 15 

Cellular Systems and Infrastructure-Based 
Wireless Networks 



Infrastructure-based wireless networks have base stations, also called access points, deployed throughout a given 
area. These base stations provide access for mobile terminals to a backbone wired network. Network control 
functions are performed by the base stations, and often the base stations are connected together to facilitate coordi- 
nated control. This infrastructure is in contrast to ad-hoc wireless networks, described in Chapter 16, which have 
no backbone infrastructure. Examples of infrastructure -based wireless networks include cellular - phone systems, 
wireless LANs, and paging systems. Base station coordination in infrastructure' -based networks provides a central- 
ized control mechanism for transmission scheduling, dynamic resource allocation, power control, and handoff. As 
such, it can more efficiently utilize network resources to meet the performance requirements of individual users. 
Moreover, most networks with infrastructure are designed such that mobile terminals transmit directly to a base 
station, with no multihop routing through intermediate wireless nodes. In general these single-hop routes have 
lower delay and loss, higher data rates, and more flexibility than multihop routes. For these reasons, the perfor- 
mance of infrastructure-based wireless networks tends to be much better than in networks without infrastructure. 
However, sometimes it is more expensive or simply not feasible or practical to deploy infrastructure, in which case 
ad-hoc wireless networks are the best option, despite their typically inferior performance. 

Cellular systems are a type of infrastructure -based network that make efficient use of spectrum by reusing it 
at spatially-separated locations. The focus of this chapter is on cellular system design and analysis, although many 
of these principles apply to any infrastructure-based network. We will first describe the basic design principles of 
cellular systems and channel reuse. System capacity issues are then discussed along with interference reduction 
methods to increase this capacity. We also describe the performance benefits of dynamic resource allocation. The 
chapter closes with an analysis of the fundamental rate limits of cellular systems in terms of their Shannon capacity 
and area spectral efficiency. 

15.1 Cellular System Fundamentals 

The basic premise behind cellular systems is to exploit the power falloff with distance of signal propagation to 
reuse the same channel at spatially-separated locations. Specifically, in cellular systems a given spatial area (like 
a city) is divided into nonoverlapping cells, as shown in Figure 15.1. The signaling dimensions of the system 
are channelized using one of the orthogonal or non-orthogonal techniques discussed in Chapter 14.2. We will 
mostly focus on TDMA and FDMA for orthogonal channelization and CDMA for non-orthogonal channelization. 
Different channel sets Ci are assigned to different cells, with channel sets reused at spatially separated locations. 
This reuse of channels is called frequency reuse or channel reuse. Cells that are assigned the same channel set. 
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called co-channel cells, must be spaced far enough apart so that interference between users in co-channel cells does 
not degrade signal quality below tolerable levels. The required spacing depends on the channelization technique, 
the signal propagation characteristics, and the desired performance for each user. 




Figure 15.1: Cellular System. 

For the cellular system shown in Figure 15.1 a base station is located near the center of each cell. Under 
ideal propagation conditions mobiles within a given cell communicate with the base station in that cell, although in 
practice the choice is based on the SINR between the mobile and the base station. When a mobile moves between 
two cells, its call must be handed off from the base station in the original cell to the base station in the new cell. 
The channel from a base station to the mobiles in its cell defines the downlink of the cell, and the channel from 
the mobiles in a cell to the cell base station defines the uplink of the cell. All base stations in a given region are 
connected to a switching office which acts as a central controller. User authentication, allocation of channels, and 
handoff between base stations is coordinated by the switching office. The handoff procedure occurs when the 
signal quality of a mobile to its base station decreases below a given threshold. This occurs when a mobile moves 
between cells and can also be due to fading or shadowing within a cell. If no neighboring base station has available 
channels or can provide an acceptable quality channel then the handoff attempt fails and the call will be dropped. 

The cellular system design must include a specific multiple access technique for both the uplink and the 
downlink. The main multiple access techniques used in cellular systems are TDMA, FDMA, orthogonal and non- 
orthogonal CDMA, and their hybrid combinations. These techniques arc sometimes combined with SDMA as 
well. Uplink and downlink design for a single cell was described in Chapter 14.2, and many of the same design 
principles and analyses apply to cellular systems, appropriately modified to include the impact of channel reuse. 
The tradeoffs associated with different multiple access techniques arc different in cellular systems than in a single 
cell, since each technique must cope with interference from outside its cell, referred to as intercell or co-channel 
interference. In addition, systems with non-orthogonal channelization must also deal with interference from within 
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a cell, called intracell interference. This intracell interference also arises in systems with orthogonal channelization 
when multipath, synchronization errors, and other practical impairments compromise the orthogonality. 

While CDMA with non-orthogonal codes has both intracell and intercell interference inherent to its design, all 
interference is attenuated by the code cross-correlation. In contrast, orthogonal multiple access techniques have no 
intracell interference under ideal operating conditions. However, their intercell interference has no reduction from 
processing gain as in spread spectrum systems. The amount of both intercell and intracell interference experienced 
by a given user is captured by his SINR, defined as 



SINR = 



Pr 

A 0 H + P/’ 



(15.1) 



where P r is the received signal power and Pj is the received power associated with both intracell and intercell 
interference. In CDMA systems Pj is the interference power after despreading. We typically compute the BER of 
a mobile based on SINR in place of SNR, although this approximation is not precisely accurate if the interference 
does not have Gaussian statistics. 

A larger intercell interference reduces SINR, and therefore increases user BER. Intercell interference can be 
kept small by separating cells operating on the same channel by a large distance. However, the number of users that 
can be accommodated in a system is maximized by reusing frequencies as often as possible. Thus, the best cellular 
system design places users that share the same channel at a separation distance where the intercell interference 
is just below the maximum tolerable level for the required data rate and BER. Good cellular system designs are 
interference-limited, meaning that the interference power is much larger than the noise power. Therefore, noise 
is generally neglected in the study of these systems. In this case SINR reduces to the signal-to-interference power 
ratio (SIR) defined as SIR = P r /Pi. In interference -limited systems, since the BER of users is determined by SIR, 
the number of users that can be accommodated is limited by the interference they cause to other users. Techniques 
to reduce interference, such as multiple antenna techniques or multiuser detection, increase the SIR and therefore 
increase the number of users the system can accommodate for a given BER constraint. Note that the SIR or BER 
requirement is fairly well-defined for continuous applications such as voice. However, system planning is more 
complex for data applications due to the burstiness of the transmissions. 

Cell size is another important design choice in cellular systems. We can increase the number of users that can 
be accommodated within a given system by shrinking the size of a cell, as long as all aspects of the system scale so 
that the SINR of each user remains the same. Specifically, consider the large and small cells shown in Figure 15.2. 
Suppose the large cell in this figure represents one cell in a cellular system where each cell accommodates K users. 
If the cell size is shrunk to the smaller cell size shown in Figure 15.2, typically by reducing transmit power, and 
everything in the system (including propagation loss) scales so that the SINR in the small cells is the same as in 
the original large cell, then this smaller cell can also accommodate K users. Since there are 19 small cells within 
the large cell, the new system with smaller cells can accommodate 1 9 K users within the area of one large cell: a 
19-fold capacity increase in the number of users that can be accommodated. However, propagation characteristics 
typically change as cell size shrinks, so the system does not scale perfectly. Moreover, a smaller cell size increases 
the rate at which handoffs occur, which increases the dropping probability if the percentage of failed handoffs stays 
the same. Smaller cells also increase the load on the backbone network. Moreover, more cells per unit area requires 
more base stations, which can increase system cost. Therefore, while smaller cells generally increase capacity, they 
also have their disadvantages. In a system with large cells, small “hotspot” cells arc sometimes embedded within 
large cells that experience high traffic to increase their capacity [1, Chapter 3.7]. 

The cell shape depicted in Figure 15.1 is a hexagon. A hexagon is a tesselating cell shape in that cells can be 
laid next to each other with no overlap to cover the entire geographical region without any gaps. The other tesse- 
lating shapes arc rectangles, squares, diamonds, and triangles. These regular cell shapes arc used to approximate 
the contours of constant receive power around the base station. If propagation follows the free-space or simplified 
path loss model where received power is constant along a circle around the base station, then a hexagon provides 
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Figure 15.2: Capacity Increase by Shrinking Cell Size. 



a reasonable approximation to this circular shape. Hexagons were commonly used to approximate cell shapes for 
the original cellular phone systems, where base stations were placed at the tops of buildings with coverage areas 
on the order of a few square miles. For smaller cells, with base stations placed closer to the ground, diamonds 
tend to better approximate the contours of constant power, especially for typical urban street grids [2, 3]. Very 
small cells and indoor cells arc heavily dependent on the propagation environment, making it difficult to accurately 
approximate contours of constant power using a tesselating shape [4], 

15.2 Channel Reuse 

Channel reuse is a key element of cellular system design. It determines how much intercell interference is ex- 
perienced by different users, and therefore the system capacity and performance. The channel reuse consider- 
ations are different for channelization via orthogonal multiple access techniques (e.g. TDMA, FDMA, and or- 
thogonal CDMA) as compared to those of non-orthogonal channelization techniques (non-orthogonal or hybrid 
orthogonal/non-orthogonal CDMA). In particular, orthogonal techniques have no intracell interference under ideal 
conditions. However, in TDMA and FDMA, cells using the same channels are typically spaced several cells away, 
since co-channel interference from adjacent cells can be very large. In contrast, non-orthogonal channelization 
exhibits both intercell and intracell interference, but all interference is attenuated by the cross-correlation of the 
spreading codes, which allows channels to be reused in every cell. In CDMA systems with orthogonal codes, typi- 
cal for the downlink in CDMA systems, codes arc also reused in every cell, since the code transmissions from each 
base station arc not synchronized. Thus, the same codes transmitted from different base stations arrive at a mobile 
with a timing offset, and the resulting intercell interference is attenuated by the code autocorrelation evaluated at 
the timing offset. This autocorrelation may still be somewhat large. A hybrid technique can also be used where 
a non-orthogonal code that is unique to each cell is modulated on top of the orthogonal codes used in that cell. 
The non-orthogonal code then reduces intercell interference by roughly its processing gain. This hybrid approach 
is used in WCDMA cellular systems [5], Throughout this chapter, we will assume that in CDMA systems the 
same codes arc used in every cell, as is typically done in practice. Thus, the reuse distance is one and we need not 
address optimizing channel reuse for CDMA systems. 

We now discuss the basic premise of channel reuse, cell clustering, and channel assignment. In interference- 
limited systems, each user’s BER is based on his received SIR: the ratio of his received signal power over his 
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intracell and intercell interference power. The received signal powers associated with the desired signal, the in- 
tercell interference, and the intracell interference are determined by the characteristics of the channel between the 
desired or interfering transmitters and the desired receiver. The average SIR is normally computed based on path 
loss alone, with median shadowing attenuation incorporated into the path loss models for the signal and interfer- 
ence. Random variations due to shadowing and flat-fading are then treated as statistical variations about the path 
loss. 

Since path loss is a function of propagation distance, the reuse distance D between cells using the same 
channel is an important parameter in determining average intercell interference power. Reuse distance is defined 
as the distance between the centers of cells that use the same channels. It is a function of cell shape, cell size, 
and the number of intermediate cells between the two cells sharing the same channel. Given a required average 
SINR for a particular - performance level, we can find the corresonding minimum reuse distance that meets this 
performance target. The focus of this section is planning the cellular system layout based on a minimum reuse 
distance requirement. 

Figure 15.3 illustrates the reuse distance associated with a given channel reuse pattern for diamond and hexag- 
onally shaped cells. Cells that are assigned the same channel set C n are so indicated in the figure. This pattern of 
channel reuse for both cell chapes is based on the notion of cell clustering, discussed in more detail below. The 
reuse distance D between these cells is the minimum distance between the dots at the center of cells using channel 
C n . The radius of a cell R is also shown in the figure. For hexagonal cells R is defined as the distance from the 
center of a cell to a vertex of the hexagon. For diamond-shaped cells R is the distance from the cell center to the 
middle of a side. 





Figure 15.3: Reuse Distance D for Hexagonal and Diamond Shaped Cells. 

For diamond-shaped cells it is straightforward to compute the reuse distance D based on the number of 
intermediate cells Nj between co-channel cells and the cell radius R. Specifically, the distance across a cell is 2 R. 
The distance from a cell center to its boundary is II and the distance across the N j intermediate cells between the 
cochannel cells is 2 RNj. Thus, D = R + 2 RNj + R = 2R(Nj + 1). 

Reuse distance for hexagonally-shaped cells is more complicated to determine, since there is not an integer 
number of cells between two co-channel cells. In particular, in Figure 15.3 if channel C n is used in the center cell 
and again in the shaded cell then there would be exactly two cells between co-channel cells and reuse distance 
would be easy to find. However, when C n is reused in the cell adjacent to the shaded cell, there is not an integer 
number of cells separating the co-channel cells. This assignment is needed to create cell clusters, as discussed 
in more detail below. The procedure for channel assignment in hexagonal cells is as follows. Consider the cell 
diagram in Figure 15.4 where R is the hexagonal cell radius. Denote the location of each cell by the pair (i,j) 
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where, assuming cell A to be centered at the origin (0, 0), the location relative to cell A is obtained by moving i 
cells along the it axis, then turning 60 degrees counterclockwise and moving j cells along the v axis. For example, 
cell G is located at (0, 1), cell S is located at (1, 1), cell P is located at (—2, 2), and cell M is located at (—1, —1). It 
is straightforward to show that the distance between cell centers of adjacent cells is \/3R, and the distance between 
the centers of the cell located at the point (i, j) and cell A (located at (0, 0)) is given by 

D = >/3. R\/i 2 + j 2 + ij. (15.2) 



Example 15.1: Find the reuse distance D for the channel reuse shown in Figure 15.3 for both the diamond and 
hexagonally shaped cells as a function of cell radius R. 

Solution: For the diamond shaped cells, there is Nj = 1 cell between co-channel cells. Thus, D = 2R(N[ + 1) = 
42?. For the hexagonal cells shown in Figure 15.3 the reuse pattern moves 2 cells along the u axis and then 1 cell 
along the v axis. Thus, D = \/3R\/2 2 + l 2 + 2 = 4.582?. 



u axis 




Figure 15.4: Axes for Reuse Distance in Hexagonal Cells. 

Given a minimum acceptable reuse distance D m i n , we would like to maintain this minimum reuse distance 
throughout the cell grid while reusing channels as often as possible. This requires spatially -repeating the cell 
clusters for channel assignment, where each cell in the cluster is assigned a unique set of channels that arc not 
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assigned to any other cell in the cluster. In order to spatially repeat, cell clusters must tesselate. For diamond- 
shaped cells a tesselating cell cluster forms another diamond, with K cells on each side, as shown in Figure 15.5 
for K = 4. The set of channels assigned to the ?ith cell in the cluster is denoted by C n . n = 1, ...N, where N is 
the number of unique channel sets, and the pattern of channel assignment is repeated in each cluster. This insures 
that cells using the same channel are separated by a reuse distance of at least D = 2 KR. The number of cells per 
cluster is N = K 2 , which is also called the reuse factor: since D = 2KR, we have N = .25 (D/R) 2 . If we let 
N c denote the number of channels per cell, i.e. the number of channels in C n , and Nt denote the total number 
of channels then N = Nt/N c . A small value of N indicates efficient channel reuse (channels reused more often 
within a given area for a fixed cell size and shape). However, a small N also implies a small reuse distance, since 
D = 2 KR = 2 '/NR, which can lead to large intercell interference. 




Figure 15.5: Cell Clusters for Diamond Cells. 

For hexagonal cells we form cell clusters through the following iterative process. The total bandwidth is first 
broken into N channel sets , Cn where N is the cluster size. We assign the first channel set C\ to any 

arbitrary cell. From this cell, we move i cells along a chain of hexagons in any direction, turn counterclockwise 
by 60 degress, and move j cells along the hexagon chain in this new direction: channel set C\ is then assigned to 
this jth cell. Going back to the original cell, we repeat the process in a different direction until we have covered all 
directions starting from the initial cell. This process is shown in Figure 1 5.6 for i = 3 and j = 2. To assign channel 
set C\ throughout the region, we repeat the iterative process starting from one of the cells assigned channel set 
C\ in a prior iteration until no new assignments can be made starting from any location assigned channel set C \. 
Then a new cell that has not been assigned any channel set is selected and assigned channel set C 2 and the iterative 
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Figure 15.6: Channel Assignment in Flexagonal Cells. 



channel assignment process for this new channel set is performed. The process is repeated until all cells have 
been assigned a unique channel set, which results in cell clusters as illustrated in Figure 15.7. The reuse distance 
between all cells using the same channel set is then D = y/S Ry/ i 2 + j 2 + ij. We can obtain the approximate 
cluster size associated with this process by finding the ratio of cluster area to cell area. Specifically, the area of a 
hexagonal cell is A ce u = 3 '/3R 2 /2, and the area of a cluster of hexagonal cells is A c i uster = V3D 2 /2. Thus, the 
number of cells per cluster is 



A duster = V3D 2 /2 = 1 (D\ 2 = 1 / 3 R 2 (i 2 + j 2 + ij) 
A ce u 3y/3R 2 /2 3 \RJ 3 V R 2 



i 2 + j 2 + ij- 



As with diamond cells, a small N indicates more efficient channel reuse, but also a smaller reuse distance D = 
Ry/3N, leading to more intercell interference. 



15.3 SIR and User Capacity 

In this section we compute the SIR of users in a cellular system and the number of users per cell that can be 
supported for a given SIR target. We neglect the impact of noise on performance under the assumption that the 
system is interference-limited, although the calculations can be easily extended to include noise. The SIR in a 
cellular system depends on many factors, including the cell layout, size, reuse distance, and propagation. We will 
assume the simplified path loss model (2.39) with reference distance do = 1 m for our path loss calculations, so 
P r = Ptkd -1 , where A; is a constant equal to the average path loss at d = do and 7 is the path loss exponent. 
The path loss exponent associated with in-cell propagation will be denoted by 7 / and the path-loss exponent for 
intercell interference signals that propagate between cells will be denoted by 7 o- These path loss exponents may 
be different, depending on the propagation environment and cell size [ 6 ]. Using the simplified path loss model, we 
will derive expressions for SIR under both orthogonal and non-orthogonal access techniques. We then find user 
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Figure 15.7: Cell Clusters for Hexagonal Cells. 



capacity for each model, defined as the maximum number of users per cell that the system can support without 
violating a required SIR target value. 

The SIR of a signal is typically used to compute the BER performance associated with that signal. Specifically, 
the interference is approximated as AWGN and then formulas for the BER versus SNR arc applied. For example, 
from (6.6), performance of uncoded BPSK without fading yields Pi, = Q(y / 2 • SIR), and from (6.58), performance 
when the desired signal exhibits Rayleigh fading yields Pf, ~ .25/SIR for high SIRs. Although we have assumed 
the simplified path loss model in the SIR formulas below, more complex path loss models can also be incorporated 
for a more accurate SIR approximation. However, there are a number of inaccuracies in the model that are not 
easy to fix. In particular, approximating the interference as Gaussian noise is accurate for a large number of 
interferers, as is the case for CDMA systems, but not accurate for a small number of interferers, as in TDMA and 
FDMA systems. Moreover, the performance computation in fading neglects the fact that the interferers also exhibit 
fading, which results in a received SIR that is the ratio of two random variables. This ratio has a very complex 
distribution that is not well approximated by a Rayleigh distribution or any other common fading distribution 
[9]. The complexity of modeling average SIR as well as its distribution under accurate path loss, shadowing, and 
multipath fading models can be prohibitive. Thus, the SIR distribution is often obtained via simulations [6]. 

15.3.1 Orthogonal Systems (TDMA/FDMA) 

In this section we compute the SIR and user capacity for cellular systems using orthogonal multiple access tech- 
niques. In these systems there is no intracell interference, so the SIR is determined from the received signal power 
and the interference resulting from co-channel cells. Under the simplified path loss model, the received signal 
power for a mobile located at distance d from its base station on both the uplink and the downlink is P r = P t kd~ l! . 
The average intercell interference power is a function of the number of out-of-cell interferers. For simplicity we 
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will neglect interference from outside the first ring of M interfering cells. This approximation is accurate when the 
path loss exponent 7 <3 is relatively large since then subsequent interfering rings have a much larger path loss than 
the first ring. In this case M = 6 for hexagonal cell shapes and M = 4 for diamond cell shapes. We assume that 
all transmitters send at the same power P t : the impact of power control will be discussed in Section 15.5.3. Let us 
assume the user is at distance d < R and there are M interferes at distance d r , i = 1 , ,M from the intended 
receiver (located at the mobile in the downlink and the base station in the uplink). The resulting SIR is then 



SIR = 



dr* 



1-7 o ' 
2-a=l a i 



(15.3) 



This SIR is the ratio of two random variables, whose distribution can be quite complex. However, the statistics 
of the interference in the denominator has been characterized for several propagation models, and when these 
interferes are log-normal, their sum is also log-normal [7, Chapter 3]. Note that in general the average SIR for 
uplink and downlink may be roughly the same, but the SIR for the uplink, where interferes can all be on the 
cell boundary closest to the base station they interfere with, generally has a smaller worst-case value than for the 
downlink, where interference comes from base stations at the cell centers. 

The SIR expression can be simplified if we assume that the mobile is on its cell boundary, d = R, and all 
interferes are at the reuse distance D from the intended receiver. Under these assumption the SIR reduces to 



R - 7 / 

SIR = — — 

MD-'yo 

and if 7/ = 70 = 7 this simplifies further to 



SIR = — 

M 




(15.4) 



(15.5) 



Since D/R is a function of the reuse factor N for most cell shapes, this allows us to express SIR in terms of 
N. In particular, from Figure 15.5, for diamond-shaped cells we have M = 8 and D/R = \J \ N . Substituting 
these into (15.5) yields SIR = .125(4iV) 7//2 . From Figure 15.7, for hexagonally-shaped cells we have M = 6 and 
D/R= -\/3 N, which yields SIR = .167(3./V) 7 / 2 . Both of these SIR expressions can be expressed as 

SIR = ai ( a 2 iV) 7/ 2 , (15.6) 



with ai = .125, a 2 = 4 for diamond cells and ai = .167, a 9 = 3 for hexagonally-shaped cells. This formula 
provides a simple approximation for the reuse distance required to achieve a given performance. Specifically, given 
a target SIR value SIRo required for a target BER, we can invert (15.6) to obtain the minimum reuse distance that 
achieves this SIR target as 



N > 



1 



SIRr 



2/7 



(15.7) 



<32 



ai 



For path loss exponent 7 = 2, this simplifies to N > SIRo/(ai02). When the signal has shadow fading, the 
analysis is more complex, but we can still generally obtain reuse distance in terms of the SIR requirement subject 
to some outage probability [8] 

The user capacity C u is defined as the total number of active users per cell that the system can support while 
meeting a common BER constraint for all users. For orthogonal multiple access, C u = N c , where N c is the 
number of channels assigned to any given cell. The total number of orthogonal channels of bandwidth B s that can 
be created from a total system bandwidth of B is Nj = B/B s . Since in orthogonal systems, the reuse factor N 
satisfies N = Nt/N c , this implies 



N t _ B _ G 
~W ~ NB S ~ N’ 



(15.8) 
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where G = B/B s is the ratio of the total system bandwidth to the bandwidth required for an individual user. 



Example 15.2: Consider a TDMA cellular system with hexagonally- shaped cells, and path loss exponent 7 = 2 
for all signal propagation in the system. Find the minimum reuse factor N needed for a target SIR of 10 dB, and 
the corresponding user capacity assuming a total system bandwidth of 20 MHz and a required signal bandwidth of 
100 KHz. 



Solution: To obtain the reuse factor, we apply (15.7) with a 1 = .167 and 02 = 3 to get 



N > 



SIRp 

0102 




Now setting G = B/B s = 20 x 10 6 /100 x 10 3 = 200, we get C u = G/N = 10 users per cell that can be 
accommodated. Typically 7 > 2, as we consider in the next example. 



Example 15.3: Consider a TDMA cellular system with diamond shaped cells, path loss exponent 7 = 4 for all 
signal propagation in the system, and BPSK modulation. Assume that the received signal exhibits Rayleigh fading. 
Suppose the users require F/, = 10 -3 . Assuming the system is interference-limited, find the minimum reuse factor 
N needed to meet this performance requirement. Also find the user capacity assuming a total system bandwidth 
of 20 MHz and a required signal bandwidth of 100 KHz. 



Solution: Treating interference as Gaussian noise, in Rayleigh fading we have Pf, ~ .25/SIRo for SIRo the 
average SIR ratio. The SIR required to meet the Pf, target is thus SIRo = .25/10 = 250 (approximately 24 dB). 

Substituting SIRo = 250, oq = .125, <22 = 4, and 7 = 4 into (15.7) yields 





11.18. 



So a reuse factor of N = 12 meets the performance requirement. For the user capacity we have G = B / B s = 200, 
so C u = G/N = 16 users per cell can be accommodated. Note that the Gaussian assumption for the interference 
is just an approximation, which becomes more accurate as the number of interferes grows by the Central Limit 
Theorem. 



15.3.2 Non-Orthogonal Systems (CDMA) 

In non-orthogonal systems codes (i.e. channels) arc typically reused in every cell, so the reuse factor is N = 1. 
Since these systems exhibit both intercell and intracell interference, the user capacity is dictated by the maximum 
number of users per cell that can be accommodated for a given target SIR. We will neglect intercell interference 
from outside the first tier of interfering cells, i.e. from cells that arc not adjacent to the cell of interest. We will 
also assume all signals follow the simplified path loss model with the same path loss exponent. This assumption is 
typically true for interference from adjacent cells, but ultimately depends on the propagation environment. 

Let N c = Nt = C u denote the number of channels per cell. In CDMA systems the user capacity is typically 
limited by the uplink, due to the near-far problem and the asynchronicity of the codes. Focusing on the uplink. 



480 




under the simplified path loss model, the received signal power is P r = Pikd^'d where d is the distance between 
the mobile and its base station. There are N c — 1 asynchronous intracell interfering signals and MN C asynchronous 
intercell interfering signals transmitted from mobiles in the M adjacent cells. Let d , , i = 1, . . . , N c — 1 denote 
the distance from the / 1 h intracell interfering mobiles to the uplink receiver and Pi denote its power. Let dj,j = 
1, . . . , MN C denote the distance from the jth intercell interferering mobile to the uplink receiver and Pj denote its 
power. From Chapter 13.4.3 all interference is reduced by the spreading code cross correlation £/(3G'), where G 
is the processing gain of the system and £ is a parameter of the spreading codes with 1 <?<3. The total intracell 
and intercell interference power is thus given by 



I 




N c - 1 



E p ‘ Kd d 



i = 1 




(15.9) 



which yields the SIR 



SIR = 



Ptd-'i 
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(15.10) 



Since all distances in this expression arc different, it cannot in general be further simplified without additional 
assumptions. Let us therefore assume perfect power control within a cell, so that the received power of the desired 
signal and interfering signals within a cell arc the same: P r = Ptd ~ 7 = Pidd^Mi. Furthermore, let 
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(15.11) 



denote the ratio of average received power from all intercell interference to that of all intracell interference un- 
der this power control algorithm. Using these approximations we get the following formula for SIR, which is 
commonly used for the uplink SIR in CDMA systems with power control [10, 1 1] 

SIR = —z . (15.12) 

|,(iV c -l)(l + A) 

Under this approximation, for a given SIR target SIRo, we can determine the user capacity C u = N c by setting 
(15.12) equal to the target SIR and solving for C u , which yields 



C u 



1 + 



1 

3g(1 + A)SIR 0 " 



(15.13) 



Voice signals need not be continuously active due to their statistical nature [12]. The fraction of time that a voice 
user actually occupies the channel is called the voice activity factor, and is denoted by a : 0 < a < 1. If 
the transmitter shuts off during nonactivity then the interference in CDMA, i.e. the denominator of (15.13), is 
multiplied by a. This increases SIR and therefore user capacity. 



Example 15.4: Consider a CDMA cellular system with perfect power control within a cell. Assume a target SIRo 
of 10 dB, a processing gain G = 200, spreading codes with £ = 2 and equal average power from inside and outside 
the cell (A = 1). Find the user capacity of this system. 



Solution: From (15.13) we have 



C u — 1 + 



600 (2 X 10) 



16, 



481 




so 16 users per cell can be accommodated. 



Since (15.13) and (15.8) provide simple expressions for user capacity, it is tempting to compare them for 
a given SIR target to determine whether TDMA or CDMA can support more users per cell. This was done in 
Examples 15.2 and 15.4, where for the same SIR target and other system parameters TDMA yielded 10 channels 
per cell while CDMA yielded 16. However, these capacity expressions arc extremely sensitive to the modeling and 
system assumptions. Increasing A from 1 to 2 in Example 15.4 reduces C u for CDMA from 16 to 11, and changing 
the path loss exponent in Example 15.2 for TDMA from 7 = 2 to 7 = 3 changes the reuse factor N from 20 to 6 , 
which in turn changes user capacity from 10 to 33. CDMA systems can trade off spreading and coding, yielding 
high coding gains and a resulting lower SINR target at the expense of some processing gain [13]; high coding gain 
is harder to achieve in a TDMA system since it cannot be traded for spreading gain. Voice activity was not taken 
into account for CDMA, which would lead to a higher capacity. Moreover, the CDMA capacity is derived under an 
assumption of perfect power control via channel inversion, whereas no power control is assumed for TDMA. The 
effects of shadowing and fading are also not taken into account; fading will cause a power penalty in CDMA due 
to the channel inversion power control, and will also affect the intercell interference power for both TDMA and 
CDMA. All of these factors and tradeoffs significantly complicate the analysis, which makes it difficult to draw 
general conclusions about the superiority of one technique over another in terms of user capacity. Analysis of user 
capacity for both TDMA and CDMA under various assumptions and models associated with operational systems 
can be found in [1 1, 14, 15, 16]. 

15.4 Interference Reduction Techniques 

Since cellular systems are ideally interference-limited, any technique that reduces interference increases SIR and 
user capacity. In this section we describe techniques for interference reduction in cellular systems, including 
sectorization, smart antennas, interference averaging, multiuser detection, and interference precancellation. 

A common technique to reduce interference is sectorization. Antenna sectorization uses directional antennas 
to divide up a base station’s 360 degree omnidirectional antenna into N sectors, as shown in Figure 15.8 for N = 8 . 
As the figure indicates, intracell and intercell interference to a given mobile comes primarily from within its sector. 
Thus, sectorization reduces interference power by roughly a factor of N under heavy loading (interferes in every 
sector). The channel sets assigned to each sector arc different, so that mobiles moving between sectors must be 
handed off to a new channel. Sectorization is a common feature in cellular systems, typically with N = 3. 

Smart antennas generally consist of an antenna array combined with signal processing in both space and time. 
Smart antennas can form narrow beams to provide high gain to the desired user’s signal, and/or can provide spatial 
nulls in the direction of interference signals [20]. However, antenna arrays can also be used for multiplexing gain, 
which leads to higher data rates, or diversity gain, which leads to better reliability. These fundamental tradeoffs arc 
described in Chapter 14.9. The use of multiple antennas for interference reduction versus their other performance 
benefits is paid of the tradeoffs associated with cellular system design. 

Intercell interference in the uplink is often dominated by one or two mobile users located near the cell bound- 
aries closest to the base station serving the desired signal. In TDMA or FDMA systems, the impact of these 
worst-case interferes can be mitigated by superimposing frequency hopping (FH) on top of the TDMA or FDMA 
channelization. With this FH overlay, all mobiles change their carrier frequency according to a unique hopping 
pattern. Since the hopping pattern of the worst-case interferes differs from that of the desired mobile, these inter- 
feres would cause intercell interference to the desired signal when the two hop patterns overlapped in both time 
and frequency, which is infrequent. Thus, the FH overlay has the effect of interference averaging, causing intercell 
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Figure 15.8: Circular Cell Sectorization for N = 8. 



interference from any given cell to be averaged relative to all interferer locations. This greatly reduces the effect of 
interference associated with mobiles on cell boundaries. For this reason, a FH overlay is used in the GSM cellular 
system. 

Another method of mitigating interference is multiuser detection [17]. Multiuser detectors jointly detect the 
desired signal and some or all interference, so that the detected interference can be mitigated or cancelled out. There 
are tradeoffs between performance and complexity of different multiuser detection methods, especially when there 
are a large number of interferes. Multiuser detection methods were described in Chapter 13.4 for the uplink of a 
single cell CDMA system. However, these detection methods can also be applied to intercell interference signals, 
both at the base station [18], where processing complexity is less of a constraint, and in the mobile device to cancel 
a few dominant interferes [19]. 

Interference precancellation takes advantage of the fact that in the downlink, the base station has knowledge 
of interference between users within its cell, as well as the interference its transmission causes to mobiles in other 
cells. This knowledge can be utilized in the base station transmission to presubtract interference between users 
[22, 23, 24]. Interference presubtraction by the base station has its roots in the capacity-achieving strategy for 
downlinks, which uses a novel “dirty-paper coding” transmission technique for interference presubtraction [21], 
Numerical results in [22, 23, 24] indicate that interference presubtraction can lead to an order of magnitude capacity 
increase in cellular systems with a large number of base station antennas. However, the pre subtraction requires 
channel CSI at the transmitter, in contrast to multiuser detection, which does not require transmitter CSI. 




15.5 Dynamic Resource Allocation 



Cellular systems exhibit significant dynamics in the number of users in any given cell and in their time-varying 
channel gains. Moreover, as cellular systems have migrated from primarily voice applications to multimedia data, 
users no longer have uniform data rate requirements. Thus, resource allocation to users must become more flexible 
to support heterogeneous applications. In dynamic resource allocation the channels, data rates, and power levels 
in the system arc dynamically assigned relative to the current system conditions and user needs. Much work has 
gone into investigating dynamic resource allocation in cellular systems. In this section we summarize some of the 
main techniques, including scheduling, dynamic channel allocation, and power control. The references given arc 
but a small sampling of the vast literature on this important topic. 

15.5.1 Scheduling 

The basic premise of scheduling is to dynamically allocate resources to mobile users according to their required 
data rates and delay constraints. Meeting these requirements is also called Quality-of-Service (QoS) support. 
Schedulers must be efficient in their allocation of resources but also fair. There is generally a tradeoff between 
these two goals since, as discussed in Chapter 14.8, the most efficient allocation of resources exploits multiuser 
diversity to allocate resources to the user with the best channel. However, this assumes that the users with the best 
channels can fully utilize these resources, which may not be the case if their application does not require a high 
data rate. Moreover, such an allocation is unfair to users with inferior channels. 

Scheduling has been investigated for both the uplink and the downlink of both CDMA and TDMA systems. 
Note that many CDMA cellular systems use a form of TDMA called high data rate (HDR) for downlink data 
transmission [1, Chapter 2.2], so these CDMA schedulers are based on TDMA channelization. Three different 
scheduling schemes: round robin, equal latency, and relative fairness, were compared for a TDMA downlink 
compatible with HDR systems in [26]. TDMA downlink scheduling exploiting multiuser diversity was investigated 
in [25]. Scheduling issues for the CDMA downlink, including rate adaptation, fairness, and deadline constraints, 
have been explored in [27, 29, 30]. For CDMA systems, uplink scheduling was investigated in [31, 32] assuming 
single-user matched filter detection, and improvements using multiuser detection were analyzed in [34] . MIMO 
provides another degree of freedom in scheduling system resources, as outlined in [33, 35]. Multiple classes of 
users can also be supported via appropriate scheduling, as described in [28]. 

15.5.2 Dynamic Channel Assignment 

Dynamic channel assignment (DCA) falls into two categories: dynamic assignment of multiple channels within 
a cell (intracell DCA) and, for orthogonally channelized systems, dynamic assignment of channels between cells 
(intercell DCA). Intercell DCA is typically not applicable to CDMA systems since channels arc reused in every 
cell. The basic premise of intercell DCA is to make every channel available in every cell, so no fixed channel 
reuse pattern exists. Each channel can be used in every cell, as long as the SIR requirements of each user arc met. 
Thus, channels arc assigned to users as needed, and when a call terminates the channel is returned to the pool of 
available channels for assignment. Intercell DCA has been shown to improve channel reuse efficiency by a factor 
of two or more over fixed reuse patterns, even with relatively simple algorithms [36, 37], Mathematically, intercell 
DCA is a combinatorial optimization problem with channel reuse constraints based on the SIR requirements. Most 
dynamic channel allocation schemes assume all system parameters arc fixed except for the arrival and departure of 
calls [38, 39, 40], with channel reuse constraints defined by a connected graph constant over all time. The channel 
allocation problem under this formulation is a generalization of the vertex coloring problem, and is thus NP-hard 
[41], Reduced complexity has been obtained by applying neural networks [42] and simulated annealing [43] to 
the problem. However, these approaches can suffer from lack of convergence. The superior efficiency of DCA 
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between cells is most pronounced under light loading conditions [38]. As traffic becomes heavier, DC A between 
cells can suffer from suboptimal allocations which arc difficult to reallocate under heavy loading conditions. User 
mobility also impacts performance of intercell DCA, since it causes more frequent channel reassignments and 
a corresponding increase in dropped calls [44]. Finally, the complexity of DCA between cells, particularly in 
systems with small cells and rapidly-changing propagation conditions and user demands, limits the practicality of 
such techniques. 

Intracell DCA allows dynamic assignment of multiple channels within a cell to a given user. In TDMA 
systems this is done by assigning a user multiple timeslots, and in CDMA by assigning a user multiple codes 
and/or spreading factors. Assigning multiple channels to users in TDMA or orthogonal CDMA systems is relatively 
straightforward and does not change the nature of the channels assigned. Dynamic timeslot assignment was used 
in [52, 53] to meet voice user requirements while reducing the delay of data services. In contrast, intracell DCA 
for non-orthogonal CDMA via either multicode or variable spreading leads to some performance degradation in 
the channels used. Specifically, a user assigned multiple codes in non-orthogonal CDMA creates self-interference 
if a single-user matched filter detector is used on each channel. While this is no worse than if the different codes 
were assigned to different users, it is clearly suboptimal in a scenario where the same user is assigned multiple 
codes. For variable-spreading CDMA, a low spreading factor provides a higher data rate, since more of the total 
system bandwidth is available for data transmission as opposed to spreading. But a reduced spreading factor 
makes the signal more susceptible to interference from other users. Intracell DCA using variable spreading gain 
has been analyzed in [45, 46], while the multicode technique was investigated in [51]. A comparison of multicode 
versus variable-spreading DCA in CDMA systems is given in [47], where it is found that the two techniques 
arc equivalent for the single-user matched filter detector, however multicode is superior for more sophisticated 
detection techniques. 

15.5.3 Power Control 

Power control is a critical aspect of wireless system design. As seen in prior chapters, a water-filling power 
adaptation maximizes capacity of adaptive rate single-user and multiuser systems in fading. Channel-inversion 
power control maintains a fixed received SNR in single-user fading channels, and also eliminates the near-far 
effect in CDMA uplinks. However, in cellular systems these power control policies impact intercell interference in 
different ways, as we now describe. 

Power control on the downlink has less impact on intercell interference than on the uplink, since the downlink 
transmissions all originate from the cell center, whereas uplink transmissions can come from the cell boundaries, 
which exacerbates interference to neighboring cells. Thus we will focus on the effect of power control on the 
uplink. Consider the two cells shown in Figure 15.9. Suppose that both mobiles B\ and /A in cell B transmit at 
the same power. Then the interference caused by the mobile B\ to the base station in cell A will be relatively large, 
since it is close to the boundary of cell A, while the interference from /A will generally be much weaker due to 
the longer propagation distance. If water-filling power adaptation is employed, then B i will generally transmit at 
a lower power than B 2 , since it will typically have a worse channel gain than B -2 to the base station in cell B as it 
is farther away. This has the positive effect of reducing the intercell interference to cell A. In other words, water- 
filling power adaptation reduces intercell interference from mobiles near cell boundaries, the primary source of this 
interference. A similar phenomenon happens with multiuser diversity, since users transmit only when they have a 
high channel gain to their base station, which is generally true when they are close to their cell center. Conversely, 
under channel inversion the boundary mobiles will transmit at a higher power to maintain the same received power 
at the base station as mobiles near the cell center. This has the effect of increasing intercell interference from 
boundary mobiles. 

The power control algorithm discussed in Chapter 14.4 for a single cell can also be extended to multiple cells. 
Assume there arc K users in the system with SIR requirement 7^ for the kth user. Focusing on the uplink, the kth 
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Figure 15.9: Effect of Power Control on Intercell Interference. 



user’s SIR is given by 
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where gp. is the channel power gain from user k to his base station, gkj > 0 is the channel power gain from the 
jth (intercell or intracell) interfering transmitter to user k’s base station, Pp. is user k’s transmit power, Pj is the 
jth interferer’s transmit power, n/ ; . is the thermal noise power at user k’ s base station, and p is the interference 
reduction due to signal processing, i.e. p « 1/G for CDMA and p = 1 in TDMA. Si mi lar to the case of uplink 
power control, the SIR constraints can be represented in matrix form as (I — F)P > u with P > 0, where 
P = (Pi, P 2 , . . . , Pk) T is the column vector of transmit powers for each user, 
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is the column vector of noise powers scaled by the SIR constraints and channel gain, and F is an irreducible matrix 
with non-negative elements given by 
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(15.16) 



with k,j 6 {1,2,..., K } . As in uplink power control problem, it is shown in [48, 49, 50] that if the Perron- 
Frobenius (maximum modulus) eigenvalue of F is less than unity then there exists a vector P > 0 (i.e. Pp- > 0 for 
all k) such that the SIR requirements of all users are satisfied, with P* = (I — F) _1 u the Pareto optimal solution. 
In other words P* meets the SIR requirements with the minimum transmit power of the users. Moreover, the 
distributed iterative power control algorithm 






(15.17) 



converges to the optimal solution. This is a very simple algorithm for power control since it only requires SIR 
information at each transmitter, where each transmitter increases power when its SIR is below its target and de- 
creases power when its SIR exceeds its target. However, it is important to note that the existence of a feasible power 
control vector that meets all SIR requirements is less likely in a cellular system than in the single-cell uplink, since 
there are more interferes contributing to the SIR and the channel gains range over a larger set of values than in 
a single-cell uplink. When there is no feasible power allocation, the distributed algorithm will result in all users 
transmitting at their maximum power and still failing to meet their SIR requirement. 

Power control is often combined with scheduling, intercell DCA, or intracell DCA. Intercell DCA and power 
control for TDMA are analyzed in [54, 55], and these techniques are extended to MIMO systems in [56]. Power 
control combined with intracell DCA exploiting multiuser diversity is investigated in [58, 57]. Intracell DCA 
for CDMA via variable spreading is combined with power control in [59, 60]. A comparison of power control 
combined with either multicode or variable spreading CDMA is given in [61]. 
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15.6 Fundamental Rate Limits 



15.6.1 Shannon Capacity of Cellular Systems 



There have been few information-theoretic results on the Shannon capacity of cellular systems due to the difficulty 
of incorporating channel reuse and the resulting interference into fundamental capacity analysis. While the capacity 
for the uplink and downlink of an isolated cell is known, as described in Chapters 14.5-14.6, there is little work 
on extending these results to multiple cells. The capacity has been characterized in some cases, but the capacity 
and optimal transmission and reception strategy under broad assumptions about channel modeling, base station 
cooperation, interference characteristics, and transmitter/receiver CSI arc mostly unsolved. 

One way to analyze the capacity of a cellular system is to assume that the base stations fully cooperate 
to jointly encode and decode all signals. In this case the notion of cells does not enter into capacity analysis. 
Specifically, under the assumption of full base station cooperation, the multiple base stations can be viewed as a 
single base station with multiple geographically-dispersed antennas. Transmission from the multiple-antenna base 
station to the mobiles can be treated as a MIMO downlink (broadcast channel), and transmission from the mobiles 
to the multiple-antenna base station can be treated as a MIMO uplink (multiple access channel). The Shannon 
capacity regions for both of these channels arc known, as discussed in Section 14.9, at least for some channel 
models and assumptions about channel side information. 

The uplink capacity of cellular systems under the assumption of full base station cooperation, where signals 
received by all base stations arc jointly decoded, was first investigated in [62] followed by a more comprehensive 
treatment in [63]. In both cases propagation between the mobiles and the base stations is characterized using an 
AWGN channel model with a channel gain of unity within a cell, and a gain of a, 0 < a < 1, between cells. The 
Wyner model of [63] considers both one and two dimensional arrays of cells, and derives the per-user capacity in 
both cases, defined as the maximum possible rate that all users can maintain simultaneously, as 



KP (1 + 2acos(2ir6)Y 



where B is the total system bandwidth, Nq/2 is the noise PSD, K is the number of mobiles per cell, and P 
is the average transmit power of each mobile. It is also shown in both [63] and [62] that uplink capacity is 
achieved by using orthogonal multiple access techniques (e.g. TDMA) in each cell, and reusing these orthogonal 
channels in other cells, although this is not necessarily uniquely optimal. The behavior of C(a) as a function of 
a, the attenuation of the intercell interference, depends on the SNR of the system. The per-user capacity C(a) 
generally increases with a at high SNRs, since having strong intercell interference aids in decoding and subsequent 
subtraction of the interference from desired signals. However, at low SNR, C(a) initially decreases with a and 
then increases. That is because weak intercell interference cannot be decoded reliably and subtracted, so this 
interference reduces capacity. As the channel gains associated with the intercell interference grows, the joint 
decoding is better able to decode and subtract out this interference, leading to higher capacity. 

An alternate analysis method for capacity of cellular systems assumes no base station cooperation so that the 
receivers in each cell treat signals from other cells as interference. This approach mirrors the design of cellular 
systems in practice. Unfortunately, Shannon capacity of channels with interference is a long-standing open prob- 
lem in information theory [64, 65], solved only for the special case of strong interference [66]. By treating the 
interference as Gaussian noise, the capacity of both the uplink and downlink can be determined using the single- 
cell analysis of Chapters 14.5-14.6. The Gaussian assumption can be viewed as a worst-case assumption about 
the interference, since exploiting known structure of the interference can presumably help in decoding the desired 
signals and therefore increase capacity. The capacity of a cellular system uplink with fading based on treating 
interference as Gaussian noise was obtained in [67] for both one and two dimensional cellular grids. These capac- 
ity results show that with or without fading, when intercell interference is nonnegligible, an orthogonal multiple 
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access method (e.g. TDMA) within a cell is optimal. This is also the case when channel-inversion power control is 
used within a cell. Moreover, in some cases partial or full orthogonalization of channels assigned to different cells 
can increase capacity. The effects on capacity for this model when there is partial joint processing between base 
stations have also been characterized [68]. 

The results described above provide some insight into the capacity and optimal transmission strategies for 
the uplink of cellular systems. Unfortunately, no such results arc yet available for the downlink, under any as- 
sumptions about channel modeling or base station cooperation. While the uplink was the capacity bottleneck for 
cellular systems providing two-way voice, the downlink is becoming increasingly critical for multimedia down- 
loads. Therefore, a better understanding of the capacity limitations and insights for cellular downlinks would be 
very beneficial in future cellular system design. 

15.6.2 Area Spectral Efficiency 

The Shannon capacity regions for cellular systems described in Section 15.6.1 dictate the set of maximum achiev- 
able rates on the cell uplinks or downlinks. When the capacity region is computed based on the notion of joint 
processing at the base stations, then there is effectively only one cell with a multiple-antenna base station. How- 
ever, when capacity is computed based on treating intercell interference as Gaussian noise the capacity region of 
both the uplink and downlink become highly dependent on the cellular system structure, in particular the cell size 
and channel reuse distance. Area spectral efficiency (ASE) is a capacity measure that allows the cellular structure, 
in particular the reuse distance, to be optimized relative to fundamental capacity limits. 

Recall that for both orthogonal and non-orthogonal channelization techniques, the reuse distance D in a 
cellular system defines the distance between any two cell centers that use the same channel. Since these resources 
arc reused at distance D, the area covered by each channel is roughly the area of a circle with radius .5 D, given 
by A = 7 t(. 5.D) 2 . The larger the reuse distance, the less efficient the channel reuse. However, reducing the reuse 
distance increases intercell interference, thereby reducing the capacity region of each cell if this interference is 
treated as noise. The ASE captures this tradeoff between efficient resource use and the capacity region per cell. 

Consider a cellular system with K users per cell, a reuse distance D, and a total bandwidth allocation B. Let 
C = (R\ . R’ 2 , , Rk ) denote the capacity region, for either the uplink or the downlink, in a given cell when the 
intercell interference from other cells is tic ate d as Gaussian noise. The corresponding sum-rate, also called the 
system throughput, is given by 

I\ 

Csr = max > Ru bps. (15.19) 

(R U -,Rk)€C 

The region C and corresponding sum-rate Csr. can be obtained for any channelization technique within a cell. 
Clearly, this capacity region will decrease as intercell interference increases. Moreover, since intercell interference 
decreases as the reuse distance increases, the size of the capacity region will be inversely proportional to reuse 
distance. 

The ASE of a cell is defined as the throughput/Hz/unit area that is supported by a cell’s resources. Specifically, 
given the sum-rate capacity described above, the ASE is defined as 

A e = bps/Hz/m 2 . (15.20) 

7t(.5 D) z 

From [67], orthogonal channelization is capacity-achieving within a cell, so we will focus on TDMA for computing 
ASE. If we also assume that the system is interference-limited so that noise can be neglected, the rate R associated 
with each user in a cell is a function of his received signal-to-interference power 7/. = Pk/Ik-, k = 1, . . . , K. If 
7/, is constant, then /(/. = 77,. /i log( 1 + 7/,), where 77 is the time fraction assigned to user k. Typically, 7*. is not 
constant, since both the interference and signal power of the kth user will vary with propagation conditions and 
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mobile locations. When 7/. varies with time, the capacity region is obtained from optimal resource allocation over 
time and across users, as described in Chapters 14.5.4 and 14.6.2. 

As a simple example, consider an AW GN TDMA uplink with hexagonal cells of radius R. Assume all users 
in the cell arc assigned the same fraction of time 77 = 1 / K and have the same transmit power P. We neglect the 
impact of intercell interference outside the first tier of interfering cells, so there are 6 intercell interferes. We take a 
pessimistic model where all users in the cell of interest arc located at the cell boundary, and all intercell interferes 
are located at their cell boundaries closest to the cell of interest. Path loss is characterized by the simplified model 
P r = P t Kd~ 2 within a cell, and P r = P t Kd~ 7 between cells, where 2 < 7 < 4. The received signal power of 
the kth user is then P/, : = PKR~ 2 , and the intercell interference power is //,. = QPK(D — i?) -7 . The maximum 
achievable rate for the A:th user in the cell is thus 

R k = log (l + (D ( .JP ) b P s > ( 15 - 21 ) 



and the ASE is 
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Plots of A e versus D for 7 = 4 and 7 = 2 arc shown in Figure 15.10, with the cell radius normalized to R = 1. 
Comparing these plots we see that, as expected, if the intercell interference pathloss falls off more slowly, the ASE 
is decreased. However, it is somewhat surprising that the optimal reuse distance is also decreased. 
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Figure 15.10: Area Spectral Efficiency for AWGN Uplink (7 = 2 and 4 .) 



Suppose now that the interferes are not on the cell boundaries. If all interferes arc at a distance D — R /2 
from the desired user’s base station, then the ASE becomes 
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The ASE in this case is plotted in Figure 15.11 for 7 = 4, along with the ASE for interferes on their cell 
boundaries. As expected, the ASE is larger for interferes closer to their cell centers than on the boundaries closest 
to the cell they interfere with, and the optimal reuse distance is smaller. 

ASE has been characterized in [69] for a cellular - system uplink with orthogonal channelization assuming 
variable -rate transmission and best-case, worst-case, and average intercell interference conditions. The impact of 



489 






D 



Figure 15.11: ASE for Interferers at distance D — R/2 and at distance D { 7 = 4) 



different fading models, cell sizes, and system load conditions was also investigated. The results indicate that the 
optimal reuse factor is 2 for both best-case and average interference conditions, i.e. channels should be reused 
in every cell, even though there is no interference reduction from spreading. Moreover, the ASE decreases as an 
exponential of a fourth-order polynomial relative to the cell radius, thus quantifying the capacity gains associated 
with reducing cell size. A si mi lar framework was used in [70] to characterize the ASE of cellular downlinks. 
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Chapter 15 Problems 



1. Consider a city of 10 square kilometers. A macrocellular system design divides the city up into square cells 
of 1 square kilometer, where each cell can accommodate 100 users. Find the total number of users that can 
be accommodated in the system and the length of time it takes a mobile user to traverse a cell (approximate 
time needed for a handoff) when moving at 30 Km/hour. If the cell size is reduced to 100 square meters and 
everything in the system scales so that 100 users can be accommodated in these smaller cells, find the total 
number of users the system can accommodate and the length of time it takes to traverse a cell. 

2. Show that the reuse distance D = R\f- i 2 + j 2 + ij for the channel assignment algorithm associated with 

hexagonal cells that is described in Section 15.2. 

3. Consider a cellular system with diamond-shaped cells of radius R = 100 m. Suppose the minimum distance 
between cell centers using the same frequency must he D = 600 m to maintain the required SINR. 

(a) Find the required reuse factor N and the number of cells per cluster. 

(b) If the total number of channels for the system is 450, find the number of channels that can be assigned 
to each cell. 

(c) Sketch two adjacent cell clusters and show a channel assignment for the two clusters with the required 
reuse distance. 



4. Consider a cellular system with hexagonal cells of radius R = 1 Km. Suppose the minimum distance 
between cell centers using the same frequency must he D = 6 Km to maintain the required SINR. 

(a) Find the required reuse factor N and the number of cells per cluster. 

(b) If the total number of channels for the system is 1200, find the number of channels that can be assigned 
to each cell. 

(c) Sketch two adjacent cell clusters and show a channel assignment for the two clusters with the required 
reuse distance. 

5. Computer the SIR for a cellular system with diamond shaped cells, where the cell radius R = 10 m and 
the reuse distance D = 60 m, assuming the path loss exponent within the cell is 77 = 2 whereas the 
intercell interference has path loss exponent 70 = 4. Compare with the SIR for 7 = 7/ = 70 = 4 and for 
7 = 77 = 70 = 2. Explain the relative orderings of SIR for each case. 

6. Find the minimum reuse distance and user capacity for a cellular system with hexagonally shaped cells, path 
loss exponent 7 = 2 for all signal propagation in the system, and BPSK modulation. Assume an AWGN 
channel model with required F), = 10 -6 , a total system bandwidth of B = 50 MHz, and a required signal 
bandwidth of 100 KHz for each user. 



7. Consider a CDMA system with perfect power control, a processing gain of G = 100, spreading codes with 
£ = 1 and A = 1.5. Find the user capacity of this system with no sectorization and with N = 3 sectors. 

8. In this problem we consider the impact of voice activity factor, which creates a random amount of interfer- 
ence. Consider a CDMA system with SINR 



SINR = 



G 



Z-ii = 1 



Xi + N' 
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where G is the processing gain of the system, the x.i represent intracell interference and follow a Bernoulli 
distribution with probability a = p(xi = 1) equal to the voice activity factor, and N characterizes intercell 
interference, and is assumed to be Gaussian with mean .247N C and variance ,0781V C . The probability of 
outage is defined as the probability that the SIR is below some target SIRo: 

Pout =p(SIR<SIRo). 



(a) Show that 



Pout = p 



N c - 1 

Y Xi + N > 



i = 1 



p- ) 

SIRo 7 ' 



(b) Find an analytical expression for Pont- 
ic) Using the analytical expression from paid (b), compute the outage probability for N c = 35 users, 
a = .5 and a target SIR of SIRo = 5 (7 dB). 

(d) Assume now that N c is sufficiently large so that the random variable Xi can be approximated 

as a Gaussian random variable. Under this approximation, find the distribution of 'YY ' Xi + N as a 
function of N c and an analytical expression for outage probability based on this approximation. 

(e) Compute the outage probability using this approximation for N c = 35 users, a = .5 and a target SIR 
of SIRq = 5 (7 dB). Compare with the results in paid (c). 



9. Assume a cellular system with K users. Suppose the minimum SIR for each user on the downlink is given 
as 7 *, ... , 7 * K . Write down the conditions such that a power control vector exists to satisfy these constraints. 

10. Plot the ASE versus reuse distance D, 0 < D < 10 of a TDMA uplink with hexagonal cells of radius 
R = 1, assuming all users in the cell are assigned the same fraction of time and the same transmit power. 
Your computation should be based on a path loss of 7 = 2 and interference from only the first ring of 
interfering cells, where the interferes have probability 1/5 of being in one of the following 5 locations: the 
cell boundary closest to the cell of interest, halfway between the base station and this closest cell boundary, in 
the cell center, the cell boundary farthest from the cell of interest, and halfway between the base station and 
this farthest cell boundary. Also plot the ASE and optimal reuse distance for the case where the interferes 
arc on the closest cell boundary with probability one, and on the farthest cell boundary with probability one. 
Compare the differences in these plots. 

1 1 . Consider a one-dimensional cellular system deployed along a highway. The system has square cells of length 
2 R = 2 Km, as shown in the figure below. This problem focuses on the downlink transmission from the 
base station to the mobiles. Assume that each cell has two mobiles located as is indicated in the figure, so 
the mobiles in each cell have the exact same location relative to their respective base stations. Assume that 
the total transmit power at each base station is Pt = 5 W, which is evenly divided between the two users in 
its cell. The total system bandwidth is 100 KHz, and the noise density at each receiver is 10 -16 W/Hz. The 
propagation follows the model P r = P t K{do/d) 3 , where do= I m and K = 100. All interference should be 
treated as AW GN, and interference outside the first ring of interfering cells can be neglected. The system 
uses a frequency division strategy, with the bandwidth allocated to each base station evenly divided between 
the two users in its cell. 



(a) For a reuse distance D = 2 (frequencies reused every other cell), what bandwidth is allocated to each 
user in the system. 
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R=1Km 




Figure 15.12: One-Dimensional Cellular System with Square Cells 



(b) Compute the minimum reuse distance D required to achieve a 10 3 BER for BPSK modulation in fast 
Rayleigh fading. 

(c) Neglecting any fading or shadowing, use the Shannon capacity formula for user rates to compute the 
area efficiency of each cell under frequency division, where frequencies are reused every other cell 
(D = 2). 

12. In this problem we investigate the per-user capacity for the uplink of a cellular system for different system 
parameters. Assume a cellular system uplink where the total system bandwidth is B = 100 KHz. Assume 
in each cell that the noise PSD at each base station receiver is Nq = 10~ 9 W/Hz, and that all mobiles have 
the same transmit power P. 

(a) Plot the per-user capacity for the uplink, as given by (15.18) for K = 10 users, transmit power P = 10 
mW per user, and 0 < o < 1. Explain the shape of the curve relative to a. 

(b) For a = .5 and P = 10 mW, plot the per-user capacity for 1 < K < 30. Explain the shape of the 
curve relative to K. 

(c) For a = .5 and K = 10, plot the per-user capacity for 0 < P < 100 mW. Explain the shape of the 
curve relative to P. 
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Chapter 16 

Ad Hoc Wireless Networks 



An ad hoc wireless network is a collection of wireless mobile nodes that self-configure to form a network without 
the aid of any established infrastructure, as shown in Figure 16.1. Without an inherent infrastructure, the mobiles 
handle the necessary control and networking tasks by themselves, generally through the use of distributed con- 
trol algorithms. Multihop routing, whereby intermediate nodes relay packets towards their final destination, can 
improve the throughput and power efficiency of the network. Webster’s lists two relevant definitions for ad hoc: 
“formed or used for specific or immediate problems”, and “fashioned from whatever is immediately available.” 
These definition capture two of the main benefits of ad hoc wireless networks: they can be tailored to specific 
applications and they can be formed from whatever network nodes arc available. Ad hoc wireless networks have 
other appealing features as well. They avoid the cost, installation, and maintenance of network infrastructure. 
They can be rapidly deployed and reconfigured. They also exhibit great robustness due to their distributed nature, 
node redundancy, and the lack of single points-of-failure. These characteristics arc especially important for mil- 
itary applications, and much of the groundbreaking research in ad hoc wireless networking was supported by the 
(Defense) Advanced Research Projects agency (DARPA) and the Navy [1, 2, 3, 4, 5, 6]. Many of the fundamental 
design principles for ad hoc wireless networks were identified and investigated in that early research. However, 
despite many advances over the last several decades in wireless communications in general, and ad hoc wireless 
networks in particular - , the optimal design, performance, and fundamental capabilities of these networks remain 
poorly understood, at least in comparison with other wireless network paradigms. 

This chapter begins with an overview of the primary applications for ad hoc wireless networks, as applica- 
tions drive many of the design requirements. Next the basic design principles and challenges of these networks 
are described. The concept of protocol layering is then discussed, along with layer interaction and the benefits of 
cross-layer design. Fundamental capacity limits and scaling laws for these networks are also outlined. The chap- 
ter concludes with a discussion of the unique design challenges inherent to energy-constrained ad hoc wireless 
networks. 

16.1 Applications 

This section describes some of the most prevalent applications for ad hoc wireless networks. The self-configuring 
nature and lack of infrastructure inherent to these networks make them highly appealing for many applications, 
even if it results in a significant performance penalty. The lack of infrastructure is highly desirable for low-cost 
commercial systems, since it precludes a large investment to get the network up and running, and deployment costs 
may then scale with network success. Lack of infrastructure is also highly desirable for military systems, where 
communication networks must be configured quickly as the need arises, often in remote areas. Other advantages of 
ad hoc wireless networks include ease of network reconfiguration and reduced maintenance costs. However, these 
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A 



Figure 16.1: Ad Hoc Network. 



advantages must be balanced against any performance penalty resulting from the multihop routing and distributed 
control inherent to these networks. 

We will focus on the following applications: data networks, home networks, device networks, sensor net- 
works, and distributed control systems. This list is by no means comprehensive, and in fact the success of ad hoc 
wireless networks hinges on making them sufficiently flexible so that there can be accidental successes. Therein 
lies the design dilemma for these networks. If the network is designed for maximum flexibility to support many 
applications (a one-size-fits-all network) then it will be difficult to tailor the network to different application re- 
quirements. This will likely result in poor performance for some applications, especially those with high rate 
requirements or stringent delay constraints. On the other hand, if the network is tailored to a few specific applica- 
tions then designers must predict in advance what these “killer applications” will be - a risky proposition. Ideally 
an ad hoc wireless network should be sufficiently flexible to support many different applications while adapting 
its performance to the applications in operation at any given time. An adaptive cross-layer design can provide this 
flexibility along with the ability to tailor protocol design to the energy constraints in the nodes. 

16.1.1 Data Networks 

Ad-hoc wireless data networks primarily support data exchange between laptops, palmtops, personal digital as- 
sistants (PDAs), and other information devices. These data networks generally fall into three categories based on 
their coverage area: LANs, MANs, and WANs. Infrastructure -based wireless LANs are already quite prevalent, 
and deliver good performance at low cost. However, ad hoc wireless data networks have some advantages over 
these infrastructure -based networks. First, only one access point is needed to connect to the backbone wired infras- 
tructure: this reduces cost and installation requirements. In addition, it can be inefficient for nodes to go through 



500 




an access point or base station. For example, PDAs that arc next to each other can exchange information directly 
rather than routing through an intermediate node. 

Wireless MANs typically require multihop routing since they cover a large area. The challenge in these net- 
works is to support high data rates, in a cost-effective manner, over multiple hops, where the link quality of each 
hop is different and changes with time. The lack of centralized network control and potential for high-mobility 
users further complicates this objective. Military programs such as DARPA’s GLOMO (Global mobile informa- 
tion systems) have invested much time and money in building high-speed ad hoc wireless MANs that support 
multimedia, with limited success [17, 18]. Ad hoc wireless MANs have also permeated the commercial sector, 
with Metricom the best example [19]. While Metricom did deliver fairly high data rates throughout several major 
metropolitan areas, significant demand never materialized, forcing Metricom to eventually hie for bankruptcy. 

Wireless WANs are needed for applications where network infrastructure to cover a wide area is too costly 
or impractical to deploy. For example, sensor networks may be dropped into remote areas where network infras- 
tructure cannot be developed. In addition, networks that must be built up and torn down quickly, e.g. for military 
applications or disaster relief, arc infeasible without an ad hoc approach. 

16.1.2 Home Networks 

Home networks arc envisioned to support communication between PCs, laptops, PDAs, cordless phones, smart 
appliances, security and monitoring systems, consumer electronics, and entertainment systems anywhere in and 
around the home. Such networks could enable smart rooms that sense people and movement and adjust light and 
heating accordingly, as well as "aware homes” that network sensors and computers for assisted living of seniors and 
those with disabilities. Home networks also encompass video or sensor monitoring systems with the intelligence 
to coordinate and interpret data and alert the home owner and the appropriate police or fire department of unusual 
patterns, intelligent appliances that coordinate with each other and with the Internet for remote control, so ft ware 
upgrades, and to schedule maintenance, and entertainment systems that allow access to a VCR, set-top box, or PC 
from any television or stereo system in the home [20, 21, 22, 23]. 

There arc several design challenges for such networks. One of the biggest is the need to support the varied 
quality-of-service (QoS) requirements for different home networking applications. QoS in this context refers to 
the requirements of a particular - application, typically data rates and delay constraints, which can be quite stringent 
for home entertainment systems. Other big challenges include cost and the need for standardization, since all 
of the devices being supported on this type of home network must follow the same networking standard. Note 
that the different devices accessing a home network have very different power constraints: some will have a fixed 
power source and be effectively unconstrained, while others will have very limited battery power and may not be 
rechargeable. Thus, one of the biggest challenges in home network design is to leverage power in unconstrained 
devices to take on the heaviest communication and networking burden, such that the networking requirements for 
all nodes in the network, regardless of their power constraints, can be met. 

16.1.3 Device Networks 

Device networks support short-range wireless connections between devices. Such networks are primarily intended 
to replace inconvenient cabled connections with wireless connections. Thus, the need for cables and the cor- 
responding connectors between cell phones, modems, headsets, PDAs, computers, printers, projectors, network 
access points, and other such devices is eliminated. The main technology drivers for such networks are low-cost 
low-power radios with networking capabilities such as Bluetooth [8, 24], Zigbee [25], and UWB [26]. The radios 
are integrated into commercial electronic devices to provide networking capabilities between devices. Some com- 
mon uses include a wireless headset for cell phones, a wireless USB or RS232 connector, wireless PCMCIA cards, 
and wireless set-top boxes. 
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16.1.4 Sensor Networks 



Wireless sensor networks consist of small nodes with sensing, computation, and wireless networking capabilities, 
as such these networks represent the convergence of three important technologies. Sensor networks have enormous 
potential for both consumer and military applications. Military missions require sensors and other intelligence 
gathering mechanisms that can be placed close to their intended targets. The potential threat to these mechanisms 
is therefore quite high, so it follows that the technology used must be highly redundant and require as little human 
intervention as possible. An apparent solution to these constraints lies in large arrays of passive electromagnetic, 
optical, chemical, and biological sensors. These can be used to identify and track targets, and can also serve as 
a first line of detection for various types of attacks. Such networks can also support the movement of unmanned, 
robotic vehicles. For example, optical sensor networks can provide networked navigation, routing vehicles around 
obstacles while guiding them into position for defense or attack. The design considerations for some industrial 
applications are quite similar to those for military applications. In particular, sensor arrays can be deployed and 
used for remote sensing in nuclear power plants, mines, and other industrial venues. 

Examples of sensor networks for the home environment include electricity, gas, and water meters that can be 
read remotely through wireless connections. The broad use of simple metering devices within the home can help 
identify and regulate appliances like air conditioners and hot water heaters that are significant consumers of power 
and gas. Simple attachments to power plugs can serve as the metering and communication devices for individual 
appliances. One can imagine a user tracking various types of information on home energy consumption from a 
single terminal: the home computer. Remote control of television usage and content could be monitored in similar 
ways. Another important home application is smoke detectors that could not only monitor different parts of the 
house but also communicate to track the spread of the fire. Such information could be conveyed to local firefighters 
before they arrived on the scene along with house blueprints. A similar type of array could be used to detect the 
presence and spread of gas leaks or other toxic fumes. 

Sensor arrays also have great potential for use at the sites of large accidents. Consider, for example, the use 
of remote sensing in the rescue operations following the collapse of a building. Sensor arrays could be rapidly 
deployed at the site of the accident and used to track heat, natural gas, and toxic substances. Acoustic sensors and 
localization techniques could be used to detect and locate trapped survivors. It may even be possible to prevent 
such tragedies altogether through the use of sensor arrays. The collapse of bridges, walkways, and balconies, for 
example, could be predicted in advance using stress and motion sensors built into the structures from the outset. 
By inserting a large number of these low-cost low-power sensors directly into the concrete before it is poured, 
material fatigue could be detected and tracked over time throughout the structure. Such sensors must be robust and 
self-configuring, and would require a very long lifetime, commensurate with the lifetime of the structure. 

Most sensors will be deployed with non-rechargeable batteries. The problem of battery lifetime in such 
sensors may be averted through the use of ultra-small energy-harvesting radios. Research in this area promise 
radios smaller than one cubic centimeter, weighing less than 100 grams, with a power dissipation level below 
100 microwatts [27]. This low level of power dissipation enables nodes to extract sufficient power from the 
environment - energy harvesting - to maintain operation indefinitely. Such radios open up new applications for 
sensor deployment in buildings, homes, and even the human body. 

16.1.5 Distributed Control Systems 

Ad hoc wireless networks also enable distributed control applications, with remote plants, sensors and actuators 
linked together via wireless communication channels. Such networks allow coordination of unmanned mobile 
units, and greatly reduce maintenance and reconfiguration costs over distributed control systems with wired com- 
munication links. Ad hoc wireless networks can be used to support coordinated control of multiple vehicles in an 
automated highway system, remote control of manufacturing and other industrial processes, and coordination of 
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unmanned airborne vehicles for military applications. 

Current distributed control designs provide excellent performance as well as robustness to uncertainty in 
model parameters. However, these designs are based on closed-loop performance that assumes a centralized ar- 
chitecture, synchronous clocked systems, and fixed topology. Consequently, these systems require that the sensor 
and actuator signals be delivered to the controller with a small, fixed delay. Ad hoc wireless networks cannot 
provide any performance guarantee in terms of data rate, delay or loss characteristics: delays arc typically random 
and packets may be lost. Unfortunately, most distributed controllers arc not robust to these types of communi- 
cation errors, and effects of small random delays can be catastrophic [28, 29]. Thus, distributed controllers must 
be redesigned for robustness to the random delays and packet losses inherent to wireless networks [30]. Ideally, 
the ad hoc wireless network can be jointly designed with the controller to deliver the best possible end-to-end 
performance. 

16.2 Design Principles and Challenges 

The most fundamental aspect of an ad hoc wireless network is its lack of infrastructure, and most design principles 
and challenges stem from this characteristic. The lack of infrastructure inherent to ad hoc wireless networks 
is best illustrated by contrast with the most prevalent wireless networks: cellular systems and wireless LANs. 
Cellular systems divide the geographic area of interest into cells, and mobiles within a cell communicate with 
a base station in the cell center that is connected to a backbone wired network. Thus, there is no peer-to-peer 
communication between mobiles. All communication is via the base station through single hop routing. The 
base stations and backbone network perform all networking functions, including authentication, call routing, and 
handoff. Most wireless LANs have a similar, centralized, single hop architecture: mobile nodes communicate 
directly with a centralized access point that is connected to the backbone Internet, and the access point performs all 
networking and control functions for the mobile nodes 1 . In contrast, an ad hoc wireless network has peer-to-peer 
communication, networking and control functions that arc distributed among all nodes, and routing that can exploit 
intermediate nodes as relays. 

Ad hoc wireless networks can form an infrastructure or node hierarchy, either permanently or dynamically. 
For example, many ad hoc wireless networks form a backbone infrastructure from a subset of nodes in the network 
to improve network reliability, scalability, and capacity [7], If a node in this backbone subset leaves the network, 
the backbone can be reconfigured. Similarly, some nodes may be chosen to perform as base stations for neighboring 
nodes [8], Thus, ad hoc wireless networks may create structure to improve network performance, however such 
structure is not a fundamental design requirement of the network. 

A lack of canonical structure is quite common in wired networks. Indeed, most metropolitan area networks 
(MANs) and wide area networks (WANs), including the Internet, have an ad hoc structure. However, the broadcast 
nature of the radio channel introduces characteristics in ad hoc wireless networks that are not present in their 
wired counterparts. In particular, with sufficient transmit power any node can transmit a signal directly to any 
other node. For a fixed transmit power, the link SINR between two communicating nodes will typically decrease 
as the distance between the nodes increases, and will also depend on the signal propagation and interference 
environment. Moreover, this link SINR varies randomly over time due to fading of the signal and interference. 
Link SINR determines the communication performance of the link: the data rate and associated probability of 
packet error or BER that can be supported on the link. Links with very low SINRs are not typically used due 
to their extremely poor performance, leading to partial connectivity among all nodes in the network, as shown in 
Figure 16.1. However, link connectivity is not a binary decision, as nodes can adapt to the SINR using adaptive 
modulation or change it using power control. The different SINR values for different links arc illustrated by the 
different line widths in Figure 16.1. Thus, in theory, every node in the network can transmit data directly to any 

'The 802.11 wireless LAN standard does include ad hoc network capabilities, but this component of the standard is rarely used. 
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other node. However, this may not be feasible if the nodes arc separated by a large distance, and direct transmission 
even over a relatively short link may have poor performance or cause much interference to other links. Network 
connectivity also changes as nodes enter and leave the network, and this connectivity can be controlled by adapting 
the transmit power of existing network nodes to the presence of a new node [9], 

The flexibility in link connectivity that results from varying link parameters such as power and data rate has 
major implications for routing. Nodes can send packets directly to their final destination via single hop routing 
as long as the link SINR is above some minimal threshold. However, the SINR is typically quite poor under 
single hop routing, and this method also causes excessive interference to surrounding nodes. In most ad hoc 
wireless networks, packets arc forwarded from source to destination through intermediate relay nodes. Since path 
loss causes an exponential decrease in received power as a function of distance, using intermediate relays can 
greatly reduce the total transmit power (the sum of transmit power at the source and all relays) needed for end- 
to-end packet transmission. Multihop routing using intermediate relay nodes is a key feature of ad hoc wireless 
networks: it allows for communication between geographically-dispersed nodes and facilitates the scalability and 
decentralized control of the network. However, it is much more challenging to support high data rates and low 
delays over a multihop wireless channel than over the single-hop wireless channels inherent to cellular systems 
and wireless LANs. This is one of the main difficulties in supporting applications with high data rate and low 
delay requirements, such as video, over an ad hoc wireless network. 

Scalability is required for ad hoc wireless networks with a large number of nodes. The key to scalability lies 
in the use of distributed network control algorithms: algorithms that adjust local performance to account for local 
conditions. By forgoing the use of centralized information and control resources, protocols can scale as the network 
grows since they only rely on local information. Work on protocol scalability in ad hoc wireless networks has 
mainly focused on self-organization [10, 1 1], distributed routing [12], mobility management [7], and security [13], 
Note that distributed protocols often consume a fair amount of energy in local processing and message exchange: 
this is analyzed in detail for security protocols in [14], Thus, interesting tradeoffs arise as to how much local 
processing should be done versus transmitting information to a centralized location for processing. This tradeoff 
is particularly apparent in sensor networks, where nodes close together have correlated data, and also coordinate 
in routing that data through the network. Most experimental work on scalability in ad hoc wireless networks has 
focused on relatively small networks, less than 100 nodes. Many ad hoc network applications, especially sensor 
networks, could have hundreds to thousands of nodes or even more. The ability of existing wireless network 
protocols to scale to such large network sizes remains unclear. 

Energy constraints are another big challenge in ad hoc wireless network design [15]. These constraints arise 
in wireless network nodes powered by batteries that cannot be recharged, such as sensor networks. Hard energy 
constraints significantly impact network design considerations. First, there is no longer a notion of data rate, since 
only a finite number of bits can be transmitted at each node before the battery dies. There is also a tradeoff between 
the duration of a bit and energy consumption, so that sending bits more slowly conserves transmit energy. Standby 
operation can consume significant energy, so sleep modes must be employed for energy conservation, but having 
nodes go to sleep can complicate network control and routing. In fact, energy constraints impact almost all of the 
network protocols in some manner, and therefore energy consumption must be optimized over all aspects of the 
network design. 

16.3 Protocol Layers 

Protocol layering is a common abstraction in network design. Layering provides design modularity for network 
protocols that facilitates standardization and implementation. Unfortunately, the layering paradigm does not work 
well in ad hoc wireless networks, where many protocol design issues arc intertwined. In this section we describe 
protocol layering as it applies to ad hoc wireless networks, as well as the interactions between protocol layers, 
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which motivate the need for cross-layer design. 

An international standard called the Open System Interconnection (OSI) model was developed as a framework 
for protocol layering in data networks. The OSI model divides the required functions of the network into seven 
layers: the application layer, presentation layer, session layer, transport layer, network Layer, the data link control 
layer, and the physical layer. Each layer is responsible for a separate set of tasks, with a fixed interface between 
layers to exchange control information and data. The basic premise behind the OSI model is that the protocols 
developed at any given layer can interoperate with protocols developed at other layers without regard to the details 
of the protocol implementation. For example, the application layer need not consider how data is routed thorugh 
the network, or what modulation and coding technique is used on a given link. The set of protocols associated 
with all layers is referred to as the protocol stack of the network. Details of the OSI model and the functionality 
associated with each of its layers are given in [31, Chapter 1.3]. 

The Internet has driven the actual implementation of layering, which is built around the Transport Control 
Protocol (TCP) for the transport Layer and the Internet Protocol (IP) for routing at the network Layer. Thus, in 
most networks the OSI layering model has been replaced by a five-layer model, also called the TCP/IP model, that 
is defined by the main functionality of the TCP and IP protocols. The five layers consist of the application layer, 
transport Layer, network layer, access layer, and physical layer. These layers arc illustrated in Figure 16.2, along 
with their primary functions in ad hoc wireless networks. These functions will be described in more detail below. 
Note that power control sits at two layers, the physical and access layer and is paid of resource allocation at the 
network layer as well [16]. Thus, power control spans multiple layers of the protocol stack, as discussed in more 
detail below. Most ad hoc wireless network designs do not use the IP protocol for routing, since routing through a 
wireless network is very different than in the Internet. Moreover, the addressing and subnetting in the IP protocol 
is not well suited to ad hoc wireless networks. Transport protocols do not necessarily use TCP either. However, 
the five-layer model is a common abstraction for the modular design of protocol layers in wireless networks. 

The layering principle for protocol design works reasonably well in wired networks like the Internet, where the 
data rates associated with the physical layer can exceed gigabits per second and packets are rarely lost. However, 
even in this setting, layering makes it difficult to support high data rate applications with hard delay constraints, 
such as video or even voice. Wireless networks can have very low physical layer data rates, with very high packet 
and bit error probabilities. In this setting protocol layering can give rise to tremendous inefficiencies and also 
precludes exploiting interactions between protocol layers for better performance. Cross-layer design considers 
multiple layers of the protocol stack together, either in terms of a joint design or in information exchange between 
the layers. Cross-layer design can exhibit tremendous performance advantages over a strictly layered approach. 
We now describe the layers of the five-layer model and their functionality in wireless networks. We then discuss 
the basic principles of cross-layer design and the performance advantages of this approach over strict layering. 

16.3.1 Physical Layer Design 

The physical layer deals primarily with transmitting bits over a point-to-point wireless link, hence it is also referred 
to as the link layer. Chapters 5-13 comprehensively cover the design tradeoffs associated with the physical layer, 
including modulation, coding, diversity, adaptive techniques, MIMO, equalization, multicarrier modulation, and 
spread spectrum. However, the design tradeoffs for a link that is paid of an ad hoc wireless network impact protocol 
layers above the physical layer. In fact, few aspects of the physical layer design in an ad hoc wireless network do 
not impact in some way the protocols associated with the higher layers. We now give some examples of this 
interaction for physical layer design choices related to packet error rate, multiple antennas, and power control. 

In most wireless ad hoc networks bits arc packetized for transmission, as described in Chapter 14.3. The 
design choices at the physical layer along with the channel and interference conditions determine the link packet 
error rate (PER). Many access layer protocols retransmit packets received in error, so PER based on the physical 
layer design affects the retransmission requirements at the access layer. Similarly, as described in Chapter 10.8, 
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Figure 16.2: Five-Layer Model for Network Protocol Design. 



multiple antennas give rise to a multiplexing/diversity/directionality tradeoff: the antennas can be used to increase 
the data rate on the link, to provide diversity to fading so that average BER is reduced, or to provide directionality 
to reduce fading and the interference a signal causes to other signals. Diversity gain will reduce PER, leading to 
fewer retransmissions. Multiplexing will increase the link rate, which reduces congestion and delay on the link and 
benefits all multihop routes using that link. Directionality reduces interference to other links, thereby improving 
their performance. Thus, the best use of multiple antennas in an ad hoc wireless network clearly transcends just 
the physical layer, in fact it simultaneously impacts the physical, access, network, and transport layers. 

The transmit power of a node at the physical layer also has a broad impact across many layers of the protocol 
stack. Increasing transmit power at the physical layer reduces PER, thereby impacting the retransmissions required 
at the access layer. In fact any two nodes in the network can communicate directly with sufficiently high transmit 
power, so this power drives link connectivity. However, a high transmit power from one node in the network can 
cause significant interference to other nodes, thereby degrading their performance and breaking their connections 
to other nodes. In particular, link performance in an ad hoc wireless network is driven by SINR, so the transmit 
power of all nodes impacts the performance of all links in the network. Broadly speaking, the transmit power 
coupled with adaptive modulation and coding for a given node defines its “local neighborhood” - the collection 
of nodes that it can reach in a single hop - and thus defines the context in which access, routing, and other higher 
layer protocols operate. Therefore, the transmit power of all nodes in the network must be optimized with respect 
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to all layers that it impacts. As such, it is a prime motivator for a cross-layer design. 



16.3.2 Access Layer Design 



The access layer controls how different users share the available spectrum and ensures successful reception of pack- 
ets transmitted over this shared spectrum. Allocation of signaling dimensions to different users is done through 
either multiple access or random access, and a detailed discuss of these access techniques can be found in Chap- 
ters 14.2-14.3. Multiple access divides the signaling dimensions into dedicated channels via orthogonal or non- 
orthogonal channelization methods. The most common of these methods are TDMA, FDMA, and CDMA. The 
access layer must also provide control functionality to assign channels to users and to deny access to users when 
they cannot be accommodated in the system. In random access, channels arc assigned to active users dynamically, 
and in multihop networks these protocols must contend with hidden and exposed terminals. The most common 
random access methods arc different forms of ALOHA, CSMA, and scheduling. These random access methods 
incorporate channel assignment and access denial into their protocols. 

As discussed in the prior section, transmit power associated with a single node impacts all other nodes. Thus, 
power control across all nodes in the network is paid of the access layer functionality. The main role of power 
control is to insure that SINR targets can be met on all links in the network. This is often infeasible, as discussed in 
more detail below. The power control algorithms described in Chapters 14.4 and 15.5.3 for meeting SINR targets 
in multiple access and cellular systems, respectively, can be extended to ad hoc networks as follows. Consider an 
ad hoc wireless network with K nodes and N links between different transmitter-receiver pairs of these nodes 2 . 
The SINR on link k is given by 
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where g^j > 0 is the channel power gain from the transmitter of the jth link to the receiver of the A:th link, Pp. 
is the power of the transmitter on the kth link, rip. is the noise power of the receiver on the feth link, and p is the 
interference reduction due to signal processing, i.e. p ss 1 jG for CDMA with processing gain G and p = 1 in 
TDMA. Suppose that the feth link requires an SINR 7 ^, determined, for example, by the connectivity and data 
rate requirements for that link. Then the SINR constraints for all links can be represented in matrix form as 
(I — F)P > u with P > 0, where P = (Pi, P 2 , . . . , Py ) 7 is the column vector of transmit powers associated 
with the transmitters on the N links. 



u = 



f 7l"l 72 n 2 

V 511 ’ 522 



l*N n N \ T 

9nn J 



(16.2) 



is the column vector of noise powers scaled by the SINR constraints and channel gains, and F is a matrix with 
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with k,j 6 {1,2, . . . , N}. As in the uplink and cellular power control problems, if the Perron-Frobenius (max- 
imum modulus) eigenvalue of F is less than unity then there exists a vector P > 0 (i.e. P\~ > 0 for all k ) such 
that the SINR requirements of all links arc satisfied, with P* = (I — F) _1 u the Pareto optimal solution. In other 
words P* meets the SINR requirements with the minimum transmit power on all links. Moreover, a distributed 
iterative power control algorithm where the transmitter on the kth link updates its trans mi t power at time i + 1 as 



Pk(i + 1 ) = -^hPkii) 
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(16.4) 



2 As noted above, all nodes can communicate to all other nodes, so there are K (K — 1) links for a network with K nodes. However, we 
will assume that only N of these are operational, so we only need consider the SINR on these N links. 
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can be shown to converge to the optimal solution P*. This is a very simple distributed algorithm for power control 
in an ad hoc wireless network, since it only requires that the SINR of the Mi link be made known to the transmitter 
on that link. Then, if this SINR is below its target, the transmitter increases its power, and if it is above this target, 
the transmitter decreases the power. It is quite remarkable that such a simple distributed algorithm converges to a 
globally optimal power control. However, when the channel gains arc not static, SINR constraints can no longer be 
met with certainty, and it is much more difficult to develop distributed power control algorithms that meet a desired 
performance target [35]. In particular, the algorithm described by (16.4) can exhibit large fluctuations in link SINR 
when the channel gains vary over time. More importantly, it is often impossible to meet the SINR constraints of all 
nodes simultaneously even when the link gains arc static, due to the large number of interferes and the range of 
channel gains associated with all signals in the network. When the SINR constraints cannot be met, the distributed 
power control algorithm will diverge such that all nodes transmit at their maximum power and still cannot meet 
their SINR constraints. This is obviously an undesirable operational state, especially for energy-constrained nodes. 

In [36] the distributed power control algorithm assuming static link gains is extended to include distributed 
admission control. The admission control provides protection for existing link SINR targets when a new user enters 
the system. In this scheme the active links have a slightly higher SINR target than needed. This buffer is used so 
that when a new user attempts to access the system, if he transmits at a low power level he will not cause the active 
links to fall below their minimum targets. The new user gradually ramps up his power and checks if he gets closer 
to his SINR target as a result. If the new user can be accommodated in the system without violating the SINR 
constraints of existing links then the distributed algorithm with this gradual ramp up will eventually converge to 
a new P* that satisfies the SINR constraints of the new and existing links. However, if the new user cannot be 
acommodated, his gradual ramp up will not get close to his required SINR and he will eventually be required to 
leave the system. Note that the criteria for denial of access is difficult to optimize in time- varying channels with 
distributed control [35]. These ideas arc combined with transmission scheduling in [37, 55] to improve power 
efficiency and reduce interference. The power control algorithm can also be modified to take into account delay 
constraints [9, 38]. However, delay constraints arc associated with the full multihop route of a packet, so power 
control should be coordinated with network layer protocols to ensure delay constraints are met on the full route. 

The access layer is also responsible for retransmissions of packets received in error over the wireless link, 
often refered to as the ARQ protocol. Specifically, data packets typically have an error detection code that is used 
by the receiver to determine if one or more bits in the packet were corrupted and cannot be corrected. For such 
packets, the receiver will typically discard the corrupted packet and inform the transmitter via a feedback channel 
that the packet must be retransmitted. However, rather than discarding the packet, the access layer can save it and 
use a form of diversity to combine the corrupted packet with the retransmitted packet for a higher probability of 
correct packet reception. Alternatively, rather than retransmitting the original packet in its entirety, the transmitter 
can just send some additional coded bits to provide a stronger error correction capability for the original packet to 
correct for its corrupted bits. This technique is called incremental redundancy, since the transmitter need only 
send enough redundant bits to correct for the bits corrupted in the original packet transmission. The diversity and 
incremental redundancy methods have been shown to substantially improve throughput in comparison with simple 
retransmissions [39]. 

16.3.3 Network Layer Design 

The network layer is responsible for establishing and maintaining end-to-end connections in the network. This 
typically requires a network that is fully connected whereby every node in the network can communicate with 
every other node, although these connections may entail multihop routing through intermediate nodes 3 . The 
main functions of the network layer in an ad hoc wireless network arc neighbor discovery, routing, and dynamic 

3 This definition differs from that in graph theory, where every node in a fully connected graph has an edge to every other node. 
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resource allocation. Neighbor discovery is the process by which a node discovers its neighbors when it first enters 
the network. Routing is another key function of the network layer: the process of determining how packets arc 
routed through the network from their source to their destination. Routing through intermediate nodes is typically 
done by relaying, although other techniques that better exploit multiuser diversity can also be used. Dynamic 
resource allocation dictates how network resources such as power and bandwidth arc allocated throughout the 
network, although in general resource allocation occurs at multiple layers of the protocol stack and thus requires 
cross-layer design. 

Neighbor Discovery and Topology Control 

Neighbor discovery is one of the first steps in the initialization of a network with randomly distributed nodes. 
From the perspective of the individual node, this is the process of determining the number and identity of network 
nodes with which direct communication can be established given some maximum power level and minimum link 
performance requirements (typically in terms of data rate and associated BER). Clearly the higher the allowed 
transmit power, the greater the number of nodes in a given neighborhood. 

Neighbor discovery typically begins with a probe of neighboring nodes using some initial trans mi t power. 
If this power is not sufficient to establish a connection with N > 1 neighbors then transmit power is increased 
and probing repeated. The process continues until N connections are established or the maximum power P rnax is 
reached. The parameter N is set based on network requirements for minimal connectivity, while P max is based on 
the power limitations of each node and the network design. If N and/or P max is small, the network may form in a 
disconnected manner, with small clusters of nodes that communicate together but cannot reach other clusters. This 
is illustrated in Figure 16.3, where the dashed circles centered around a node indicate the neighborhood within 
which it can establish connections with other nodes. If N and P max are large then while the network is typically 
fully connected, many nodes are transmitting at higher power than necessary for full network connectivity, which 
can waste power and increase interference. Once the network is fully connected a more sophisticated distributed 
power control algorithm such as (16.4) can be activated to meet target SINR levels on all links with minimal 
transmit power. Alternatively, power control can be used to create a desired topology [18]. 

The exact number of neighbors that each node requires to obtain a fully-connected network depends on the 
exact network configuration and channel characteristics, but is generally on the order of six to eight for randomly 
distributed immobile nodes with channels characterized by path loss alone [3, 31]. The number required for 
full connectivity under more general assumptions is analyzed in [40, 41, 42]. As node mobility increases, links 
typically experience large gain variations due to fading. These variations can make it harder for the network to stay 
fully connected at all times unless the nodes can increase their transmit powers to compensate for instantaneous 
fading. If the data is delay-tolerant than fading can actually improve network connectivity since it provides network 
diversity [90]. As network density decreases, network connectivity typically suffers [43, 41, 44, 45]. Connectivity 
is also heavily influenced by the ability to adapt various parameters at the physical layer such as rate, power, and 
coding, since communication is possible even on links with low SINR if these parameters are adapted. 

Routing 

The routing protocol in an ad hoc wireless network is a significant design challenge, especially under node mobility 
where routes must be dynamically reconfigured due to rapidly changing connectivity. There is broad and extensive 
work spanning several decades on routing protocols for ad hoc wireless networks, which is difficult to classify in 
a simple manner. We will focus on three main categories of routing protocols: flooding, proactive (centralized, 
source-driven, or distributed), and reactive routing [46, Chapter 5] 

In flooding, a packet is broadcast to all nodes within receiving range. These nodes also broadcast the packet, 
and the forwarding continues until the packet reaches its ultimate destination. Flooding has the advantage that 
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Figure 16.3: Disconnected Network. 



it is highly robust to changing network topologies and requires little routing overhead. In fact, in highly mobile 
networks flooding may be the only feasible routing strategy. The obvious disadvantage is that multiple copies of 
the same packet traverse through the network, wasting bandwidth and battery power of the transmitting nodes. 
This disadvantage makes flooding impractical for all but the smallest of networks. 

The opposite philosophy to flooding is centralized route computation. In this approach information about 
channel conditions and network topology arc determined by each node and forwarded to a centralized location 
that computes the routing tables for all nodes in the network. These tables are then communicated to the nodes. 
The criterion used to compute the optimal route depends on the optimization criterion. Common criteria for 
route optimization include minimum average delay, minimum number of hops, and minimum network congestion 
[47]. In general these criteria correspond to a cost associated with each hop along a route. The minimum cost 
route between the source and destination is obtained using classic optimization techniques such as the Bellman- 
Ford or Dijkstra’s algorithm [48], and this form of routing is also called link state routing. While centralized 
route computation provides the most efficient routing according to the optimality condition, it cannot adapt to 
fast changes in the channel conditions or network topology, and also requires much overhead for periodically 
collecting local node information and then disseminating the routing information. Centralized route computation, 
like flooding, is typically only used in very small networks. 

A variation on centralized route computation is source-driven routing, where each node obtains connectivity 
information about the entire network which is then used to calculate the best route from the node to its desired 
destination. Source-driven routing must also periodically collect network connectivity information, which entails 
significant overhead. Both centralized and source-driven routing can be combined with hierarchical routing, where 
nodes arc grouped into a hierarchy of clusters, and routing is performed within a cluster at each level of the 
hierarchy. 

Distributed route computation is the most common routing procedure used in ad hoc wireless networks. In 
this protocol nodes send their connectivity information to neighboring nodes and then routes are computed from 
this local information. In particular, nodes determine the next hop in the route of a packet based on this local infor- 
mation. There are several advantages to distributed route computation. First, the overhead of exchanging routing 
information with local nodes is minimal. In addition, this strategy adapts quickly to link and connectivity changes. 
The disadvantages of this strategy are that global routes based on local information are typically suboptimal, and 
routing loops arc often common in the distributed route computation. These loops arc avoided with the destination 
sequenced distance vector (DSDV) protocol by having sequence numbers as paid of the routing tables [49]. 

Both centralized and distributed routing require fixed routing tables that must be updated at regular intervals. 
An alternate approach is reactive (on-demand) routing, where routes arc created only at the initiation of a source 
node that has traffic to send to a given destination. This eliminates the overhead of maintaining routing tables for 
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routes not currently in use. In this strategy a source node initiates a route-discovery process when it has data to 
send. This process will determine if one or more routes arc available to the destination. The route or routes arc 
maintained until the source has no more data for that particular' destination. The advantage of reactive routing 
is that globally-efficient routes can be obtained with relatively little overhead, since these routes need not be 
maintained at all times. The disadvantage is that reactive routing can entail significant initial delay, since the route 
discovery process is initiated when there is data to send, but transmission of this data cannot commence until the 
route discovery process has concluded. The most common protocols for on-demand routing are ad hoc on-demand 
distance vector routing (AODV) [52] and dynamic source routing (DSR) [51]. Reactive and proactive routing are 
combined in a hybrid technique called the zone routing protocol (ZRP), which reduces the delay associated with 
reactive routing as well as the overhead associated with proactive routing [50]. 

Mobility has a huge impact on routing protocols as it can cause established routes to no longer exist. High 
mobility especially degrades the performance of proactive routing, since routing tables quickly become outdated, 
requiring an enormous amount of overhead to keep them up to date. Flooding is effective in maintaining routes 
under high mobility, but has a huge price in terms of network efficiency. A modification of flooding called multipath 
routing can be very effective without adding significant overhead. In multipath routing a packet is duplicated on 
only a few end-to-end paths between its source and destination. Since it is unlikely that the duplicate packets 
are lost or significantly delayed on all paths simultaneously, the packet has a high probability of reaching its final 
destination with minimal delay on at least one of the paths [53]. This technique has been shown to perform well 
under dynamically changing topologies. 

The routing protocol is based on an underlying network topology: packets can only be routed over links 
between two nodes. However, as described earlier, the definition of connectivity between two nodes is somewhat 
flexible, it depends on the SINR of the link as well as the physical layer design, which determines the required SINR 
for data to be reliably transmitted over the link. The access layer also plays a role in connectivity, since it dictates 
the interference between links. Thus, there is significant interaction between the physical, access, and network 
layers [54]. This interaction was investigated in [55], where it was found that if CSMA/CA access is coupled with a 
routing protocol that uses links with a low SINR, network throughput is significantly reduced. Another interesting 
result in [55] is that maintaining a single route between any source-destination pair is suboptimal in terms of 
total network throughput. Multiplexing between multiple routes associated with any given source-destination pair 
provides an opportunity to change the interference that pair causes to other end-to-end routes, and this diversity 
can be exploited to increase network throughput. 

Routing algorithms can also be optimized for requirements associated with higher layer protocols, in particular 
delay and data rate requirements of the application layer. Such algorithms are referred to as QoS routing. The goal 
of QoS routing is to find routes through the network that can satisfy the end-to-end delay and data rate requirements 
specified by the application. Examples of QoS routing and its performance are given in [56, 57, 58]. 

Most routing protocols use a decode-and-forward strategy at each relay node, where packets received by the 
relay are decoded to remove errors through error correction and retrans mi ssions requested when errors are detected 
that cannot be corrected. An alternate strategy is amplify-and-forward, where the relay node simply retransmits 
the packet it has received without attempting to remove errors or detect corrupted packets. This simplifies the 
relay design, reduces processing energy at the relay, and reduces delay. However, amplify-and-forward does 
not work well in a wireless setting, since each wireless link is unreliable and often introduces errors, which get 
compounded on each hop of a route. An alternative to these two strategies is cooperative diversity, where the 
diversity associated with spatially distributed users is exploited in forwarding packets [59, 60]. This idea was 
originally proposed in [61, 62], where multiple transmitters cooperate by repeating detected symbols of the other, 
thereby forming a repetition code with spatial diversity. These ideas have led to more sophisticated cooperative 
coding techniques [63] along with forms of cooperative diversity other than coding [64]. Finally, network coding 
fuses data received along multiple routes to increase network capacity [65, 66, 67]. While network coding has been 
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primarily applied to multicasting in wired networks, it can also be used in a wireless setting [68]. 



Resource Allocation and Flow Control 



A routing protocol dictates the route a packet should follow from a source node to its destination. When the 
routing optimization is based on minimum congestion or delay, routing becomes intertwined with flow control, 
which typically sits at the transport layer. If the routing algorithm sends too much data over a given link, that link 
becomes congested, so that the routing algorithm must change to a different route to avoid this link. Moreover, the 
delay associated with a given link is a function of the link data rate or capacity: the higher the capacity, the more 
data that can flow over that link with minimal delay. Since link capacity depends on the resources allocated to the 
link, in particular transmit power and bandwidth, we see that routing, resource allocation, and flow control arc all 
interdependent. 

The classic metric for delay on a link from node i to node j, neglecting processing and propagation delay, is 
[31, Chapter 5.4] 
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where / (? is the traffic flow assigned to the link and Cij is its capacity. This formula has its roots in queueing 
theory, and provides a good metric in practice, since the closer the flow is to the maximum data rate on a given 
link, the more likely the link will get congested and incur delay. Another metric on the link between nodes i and j 
is the link utilization, given by 

Dij = (16.6) 

^y 



As discussed in [31, Chapter 5.4], this metric has properties that arc comparable to those of the delay metric 
(16.5), and is also a quasi-convex function 4 of both flow and capacity, which allows efficient convex optimization 
methods to be applied in the routing computations. If the data flows across links in the network arc fixed, then the 
routing algorithm can compute the per-hop cost based on the delay metric (16.5) or the utilization metric (16.6) to 
find the minimum cost route through the network. The difference between these two metrics is that delay grows 
asymptotically large as the flow approaches link capacity, whereas link utilization approaches unity. Thus, the delay 
metric (16.5) has a much higher cost than the utilization metric (16.6) when flow is assigned to a link operating 
at close to its capacity. Once the new route is established, it will change the link flows along that route. While in 
most cases this change in flow will not be large, since the contribution of any one node to overall traffic is small, in 
small to moderately-sized networks a demanding application such as video can cause significant self-congestion. 

The link metrics (16.5) and (16.6) assume a fixed link capacity Cij. However, this capacity is a function of the 
SINR on the link as well as the bandwidth allocated to that link. By dynamically allocating network resources such 
as power and bandwidth to congested links, their capacities can be increased and their delay reduced. However, this 
may take away resources from other links, thereby decreasing their capacity. These changes in link capacity will in 
turn change the link metrics used to compute optimal routes, and will ultimately affect the overall performance of 
the network. Hence, the performance of the network depends simultaneously on routing, flow control, and resource 
allocation. 

The joint optimization of flow control, routing, and resource allocation can be formulated as a convex opti- 
mization problem over the flow and communications variables as long as the cost and capacity functions are convex 
or quasi-convex. Interior-point convex optimization methods can then be applied to solve for the optimal design. 
This approach has been investigated in [69, 70, 71] for both TDMA and CDMA wireless networks to minimize 
power, maximize link utilization, or maximize flow utility through joint routing and resource allocation. Si mi lar 
ideas using iterative optimization were explored in [72]. The maximum throughput in this setting can lead to highly 



4 A function is quasi-convex if the set over which its value is below any given threshold is convex. 
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unfair allocation of system resources [73], although the framework can be modified to include fairness constraints 
[74], 



16.3.4 Transport Layer Design 

The Tranport layer provides the end-to-end functions of error recovery, retransmission, reordering, and flow con- 
trol. While individual links provide error detection and retransmissions, these mechanisms arc not foolproof. The 
transport layer provides an extra measure of protection by monitoring for corrupt or lost packets on the end-to-end 
route, and requesting a retransmission from the original source node if a packet is determined to be lost or cor- 
rupted. In addition, packets may arrive out of order due to multipath routing, delays and congestion, or packet loss 
and retransmission. The transport layer serves to order packets transmitted over an end-to-end route before passing 
them to the application layer. 

The transport layer also provides flow control for the network, allocating flows associated with the application 
layer to different routes. The TCP protocol for the transport layer does not work well in wireless networks, since 
it assumes all lost packets arc due to congestion, and invokes congestion control as a result. In wired networks 
congestion is the primary reason for packet loss, so the TCP protocol works well, and that is why it is used for 
the transport layer of the Internet. However, in wireless networks packets arc mostly lost due to fading and node 
mobility. Invoking congestion control in this case can lead to extreme inefficiency [46, Chapter 11.5]. There has 
been some progress on developing mechanisms to improve TCP performance for wireless networks by providing 
transport-layer feedback about link failures, with somewhat limited success. 

In general, flow control in wireless networks is intricately linked to resource allocation and routing, as de- 
scribed in Section 16.3.3. This interdependency is much tighter in wireless networks than in their wired counter- 
parts. In particular, wired networks have links with fixed capacity, whereas the capacity of a wireless link depends 
on the interference between links. Traffic flows assigned to a given link will cause interference to other links, 
thereby affecting their capacity and delay. This interdependency makes it difficult to separate out the functions of 
flow control, resource allocation, and routing into separate network and transport layers, motivating a cross-layer 
design between them. 

16.3.5 Application Layer Design 

The application layer generates the data to be sent over the network and processes the corresponding data received 
over the network. As such, this layer provides compression of the application data along with error correction and 
concealment. The compression must be lossless for data applications, but can be lossy for video, voice, or image 
applications, where some loss can be tolerated in the reconstruction of the original data. The higher the level of 
compression, the less the data rate burden imposed on the network. However, highly compressed data is more 
sensitive to errors since most of the redundancy is removed. Data applications cannot tolerate any loss, so packets 
that arc corrupted or lost in the end-to-end transmission must be retransmitted, which can entail significant delay. 
Voice, video, and image applications can tolerate some errors, and techniques like error concealment or adaptive 
playback can mitigate the impact of these errors on the perceived quality at the receiving end [75, 76]. Thus, a 
tradeoff at the application layer is data rate versus robustness: the higher its rate, the more it burdens the network, 
but the more robust that data is to network performance. 

The application layer can also provide a form of diversity through multiple description coding (MDC) [77, 78]. 
MDC is a form of compression whereby multiple descriptions of the data arc generated. The original data can be 
reconstructed from any of these descriptions with some loss, and the more descriptions that arc available, the 
better the reconstruction. If multiple descriptions of the source data are sent through the network, some of these 
descriptions can be lost, delayed, or corrupted without significantly degrading overall performance. Thus, MDC 
provides a form of diversity at the application layer to unreliable network performance. Moreover, MDC can 
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be combined with multipath routing to provide cross-layer diversity in both the application description and the 
routes over which these descriptions arc sent [79]. The tradeoff is that for a given data rate, a MDC entails a 
higher resolution loss than a compression technique that is not geared to providing multiple descriptions. This can 
be viewed as a performance-diversity tradeoff: the application sacrifices some level of performance in order to 
provide robustness to uncertainty in the network. 

Many applications require a guaranteed end-to-end data rate and delay for good performance, collectively 
referred to as QoS. The Internet today, even with high-speed high-quality fixed communication links, is unable to 
deliver guaranteed QoS to the application in terms of guaranteed end-to-end rates or delays. For ad hoc wireless 
networks, with low-capacity error-prone time- varying links, mobile users, and a dynamic topology, the notion of 
being able to guarantee these forms of QoS is simply unrealistic. Therefore, ad hoc wireless network applications 
must adapt to time-varying QoS parameters offered by the network. While adaptivity in the physical, access, 
and network Layers, as described in previous sections, will provide the best possible QoS to the application, this 
QoS will vary with time as channel conditions, network topology, and user demands change. Applications must 
therefore adapt to the QoS that is offered. There can also be a negotiation for QoS such that users with a higher 
priority can obtain a better QoS by lowering the QoS of less important users. 

As a simple example, the network may offer the application a rate-delay tradeoff curve that is derived from the 
capabilities of the lower layer protocols [80]. The application layer must then decide at which point on this curve 
to operate. Some applications may be able to tolerate a higher delay but not a lower overall rate. Examples include 
data applications in which the overall data rate must be high but latency might be tolerable. Other applications 
might be extremely sensitive to delay (e.g. a distributed-control application) but might be able to tolerate a lower 
rate (e.g. via a coarser quantization of sensor data). Lossy applications like voice or video might exchange some 
robustness to errors for a higher data rate. Energy constraints introduce another set of tradeoffs related to network 
performance versus longevity. Thus, the tradeoff curves in network design will typically be multidimensional, 
incorporating rate, delay, robustness, and longevity tradeoffs. These tradeoffs will also change with time as the 
number of users in the network and the network environment change. 

16.4 Cross-Layer Design 

The lack of backbone infrastructure, decentralized control, and the unique characteristics of wireless links make 
it difficult to support demanding applications over ad hoc wireless networks, especially applications with high 
data rate requirements and hard delay constraints. The layering approach to wireless (and wired) network design, 
where each layer of the protocol stack is oblivious to the design and operation of other layers, has not worked 
well in general, especially under stringent performance requirements. Layering precludes the benefits of joint 
optimization discussed in prior sections. Moreover, good protocol designs for isolated layers often interact in neg- 
ative ways across layers, which can significantly degrade end-to-end performance as well as making the network 
extremely fragile to network dynamics and interference. Thus, stringent performance requirements for wireless 
ad hoc networks can only be met through a cross-layer design. Such a design requires that the interdependencies 
between layers arc characterized, exploited, and jointly optimized. Cross-layer design clearly requires information 
exchange between layers, adaptivity to this information at each layer, and diversity built into each layer to insure 
robustness. 

While cross-layer design can be applied to both wireless and wired networks, wireless ad hoc networks pose 
unique challenges and opportunities for this design framework due to the characteristics of their physical layer. 
The existence of a link between nodes, which can be used to communicate between the nodes or cause them to 
interfere with each other, can be controlled by adaptive protocols such as adaptive modulation and coding, adaptive 
space-time signal processing, and adaptive power control. Since higher layer protocols (access and routing) depend 
on underlying node connectivity and interference, adaptivity at the physical layer can be exploited by higher layer 
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protocols to achieve better performance. At the same time, some links exhibit extreme congestion or fading. 
Higher layer protocols can bypass such links through adaptive routing, thereby minimizing delays and bottlenecks 
that arise due to weak links. At the highest layer, information about the throughput and delay of end-to-end routes 
can be used to change the compression rate of the application or send data over multiple routes via MDCs. Thus, 
higher layer protocols can adapt to the status of lower layers. 

Adaptation at each layer of the protocol stack should compensate for variations at that layer based on the 
time scale of these variations. Specifically, variations in link SINR arc very fast, on the order of microseconds 
for fast fading. Network topology changes more slowly, on the order of seconds, while variations of user traffic 
based on their applications may change over tens to hundreds of seconds. The different time scales of the network 
variations suggest that each layer should attempt to compensate for variation at that layer first. If adapting locally is 
unsuccessful then information should be exchanged with higher layers for a broader response to the problem. For 
example, suppose the link SINR in an end-to-end route is low. By the time this connectivity information is relayed 
to a higher level of the protocol stack (i.e. the network layer for rerouting or the application layer for reduced- 
rate compression), the link SINR will most likely have changed. Therefore, it makes sense for each protocol 
layer to adapt to variations that arc local to that layer. If this local adaptation is insufficient to compensate for the 
local performance degradation then the performance metrics at the next layer of the protocol stack will degrade as a 
result. Adaptation at this next layer may then correct or at least mitigate the problem that could not be fixed through 
local adaptation. For example, consider again a low SINR link. Link SINR can be measured quite accurately and 
quickly at the physical layer. The physical layer protocol can therefore respond to the low SINR by increasing 
transmit power or the level of error correction coding. This will correct for variations in connectivity due to, 
for example, multipath flat-fading. However, if the weak link is caused by something difficult to correct for at the 
physical layer, e.g. the mobile unit is inside a tunnel, then it is better for a higher layer of the network protocol stack 
to respond by, for example, delaying packet transmissions until the mobile leaves the tunnel. Similarly, if nodes in 
the network arc highly mobile then link characteristics and network topology will change rapidly. Informing the 
network layer of highly-mobile nodes might change the routing strategy from unicast to broadcast in the general 
direction of the intended user. Ultimately, if the network cannot deliver the QoS requested by the application, then 
the application can adapt to whatever QoS is available. It is this integrated approach to adaptive networking - how 
each layer of the protocol stack should respond to local variations given adaptation at higher layers - that comprises 
an adaptive cross-layer protocol design. 

Diversity is another mechanism to be exploited in cross-layer design. Diversity is commonly used to provide 
robustness to fading at the physical layer. However, the basic premise of diversity can be extended across all 
layers in the network protocol stack. Cooperative diversity provides diversity at the access layer by using multiple 
spatially-distributed nodes to aid in forwarding a given packet. This provides diversity against packet corruption 
on any one link. Network layer diversity is inherent to multipath routing, such that multiple routes through the 
network arc used to send a single packet. This induces a similar diversity/throughput tradeoff at the network layer 
as was described for MIMO systems at the physical layer in Chapter 10.5. Specifically, a packet transmitted over 
multiple routes through the network is unlikely to get dropped or significantly delayed simultaneously on all routes. 
Thus, the packet dropping probability and average delay is decreased by the network diversity. However, the packet 
utilizes network resources that could be used to send other packets, thereby reducing overall network throughput. 
Application layer diversity follows from using MDCs to describe the application data, such that as long as one of 
the descriptions is received, the source data can be reproduced, albeit with higher distortion than if the reproduction 
is based on all descriptions. Diversity across all layers of the protocol stack, especially when coupled with adaptive 
cross-layer design, can insure reliability and good performance over wireless ad hoc networks despite their inherent 
challenges. 

Cross-layer design across multiple protocol layers below the application layer were discussed in the preceed- 
ing sections. Cross-layer design that includes the application layer along with lower layers is a difficult challenge 
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requiring interdisciplinary expertise, and there is little work addressing this challenge to date. However, the poten- 
tial performance gains arc significant, as illustrated by the cross-layer designs in [81, 82, 83] for video and image 
transmission in ad hoc wireless networks. 

Cross-layer design is particularly important in energy-constrained networks, where each node has a finite 
amount of energy that must be optimized across all layers of the protocol stack. Energy constraints pose unique 
challenges and opportunities for cross-layering. Some of these design issues will be discussed in Section 16.6.4. 

16.5 Network Capacity Limits 

The fundamental capacity limits of an ad hoc wireless network - the set of maximum data rates possible between 
all nodes - is a highly challenging problem in information theory. For a network of K nodes, each node can 
communicate with K — 1 other nodes, so the capacity region has dimension K(K — 1) for sending independent 
information between nodes. The capacity region with common information or multicasting is much larger. Even 
for a small number of nodes, the capacity for simple channel configurations within an ad hoc wireless network 
such as the general relay and interference channel remains unsolved [84], While rate sums across any cutset of the 
network is bounded by the corresonding mutual information [84, Theorem 14.10.1], simplifying this formula into 
a tractable expression for the ad hoc network capacity region is an immensely complex problem. 

Given that the entire capacity region appears intractable to find, insights can be obtained by focusing on a less 
ambitious goal. A landmark result by Gupta and Kumar in [85] obtained scaling laws for network throughput as 
the number of nodes in the network K grows asymptotically large. They found that the throughput in terms of bits 
per second for each node in the network decreases with K at a rate between 1 / \JK log K and 1 / \[K. In other 
words the per-node rate of the network goes to zero, although the total network throughput, equal to the per node 
rate multiplied by K , grows at a rate between \J Kj log I\ and \J~K. This surprising result indicates that even with 
optimal routing and scheduling, the per-node rate in a large ad hoc wireless network goes to zero. The reason is 
that intermediate nodes spend much of their resources forwarding packets for other nodes, so few resources arc 
left to send their own data. To some extent this is a pessimistic result, since it assumes that nodes chose their 
destination node at random, whereas in many networks communication between nodes is mostly local. This work 
was extended in [86] to show that if mobile nodes can transmit information by physically transporting it close to 
its desired destination then node mobility actually increases the per-node rate to a constant, i.e. mobility increases 
network capacity. The increase follows from the fact that mobility introduces variation in the network that can be 
exploited to improve per-user rates. However, in order to exploit the variations due to mobility, significant delay 
can be incurred. The tradeoff between throughput and delay in asymptotically large fixed and mobile networks was 
characterized in [87, 88]. Similar ideas were applied to finite size networks and networks with relays in [41, 89] 

An alternative approach to scaling laws is to compute achievable rate regions based on suboptimal transmis- 
sion strategies. This approach was taken in [90] to obtain achievable rate regions based on a time-division strategy 
associated with all possible rate matrices. The rate matrices describe the set of rates that can be sustained simul- 
taneously by all source -destination pairs at any snapshot in time. By taking a convex combination of rate matrices 
at different timeslots, all achievable rates between source-destination pairs under a time division strategy can be 
obtained. A rate matrix is a function of the nodes transmitting at that time and the resulting SINR on all links, 
as well as the transmission strategy. The more capable the transmission strategy, the larger the data rates in a 
given matrix and the more matrices that are available for use in the time-division scheme. Some of the strategies 
considered in [90] include variable rate transmission, single hop or multihop routing, power control, and succes- 
sive interference cancellation. The framework can also include the effects of mobility and fading. Figure 16.4 
illustrates a two-dimensional slice of a rate region for a network of five nodes randomly distributed in a square 
area. It is assumed that signal propagation between nodes is governed by the simplified path loss model with path 
loss exponent 7 = 4. This two-dimensional slice of the 20-dimensional rate region indicates the rates achievable 
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between two pairs of nodes: from node 1 to 2, and from node 3 to 4, when all other nodes in the network may be 
used to help forward traffic between these nodes but do not generate any independent data of their own. The figure 
assumes variable-rate transmission based on the link SINRs, and plots the achievable rate region assuming single- 
hop or multihop routing, spatial reuse, power control, and successive interference cancellation. We see substantial 
capacity increase by adding multihop routing, spatial reuse, and interference cancellation. Power control does not 
provide significant increase, because adaptive modulation is already being exploited, and adding power control in 
addition does not make that much difference, at least for this particular network configuration. 




Figure 16.4: Capacity region slice of 5 node network along the plane = 0, { ij } / {12}, {34}, i / j: (a) Single hop 
routing, no spatial reuse, (b) Multihop routing, no spatial reuse, (c) Multihop routing with spatial reuse, (d) Two level power 
control added to (c). (e) Successive interference cancellation added to (c). 

Network capacity regions under different forms of cooperative diversity have also been explored [91, 92, 
93, 94, 95]. Since the capacity region of a general ad hoc network is unknown, capacity under cooperation has 
generally been characterized by lower bounds based on achievable rate regions or upper bounds based on the 
cutset mutual information bound. Results show that cooperation can lead to substantial gains in capacity, but the 
advantages of either transmitter or receiver cooperation as well as the most advantageous cooperative techniques 
to use are highly dependent on the network topology and the availability of channel information. 

16.6 Energy- Constrained Networks 

Many ad hoc wireless network nodes are powered by batteries with a limited lifetime. Thus, it is important to 
consider the impact of energy constraints in the design of ad hoc wireless networks. Devices with rechargeable 
batteries must conserve energy to maximize time between recharging. In addition, many interesting applications 
have devices that cannot be recharged, i.e. sensors that are imbedded in walls or dropped into a remote region. 
Such radios must operate for years solely on battery power and energy that can be harvested from the environment. 
The p-AMPs and Picoradio projects arc aimed at developing radios for these applications that can operate on less 
than 100 microwatts and exploit energy-harvesting to prolong lifetime [96, 97, 27]. 

Energy constraints impact the hardware operation, transmit power, and the signal processing associated with 
node operation. The required transmit energy-per-bit for a given BER target in a noisy channel is minimized by 
spreading the signal energy over all available time and bandwidth dimensions [98]. However, transmit power is not 
the only factor in power consumption. The signal processing associated with packet transmission and reception, 
and even hardware operation in a standby mode, consume nonnegligible power as well [99, 10, 100]. This entails 



517 




interesting energy tradeoffs across protocol layers. At the physical layer many communications techniques that 
reduce transmit power require a significant amount of signal processing. It is widely assumed that the energy 
required for this processing is small and continues to decrease with ongoing improvements in hardware technology 
[10, 101]. However, the results in [99, 100] suggest that these energy costs are still significant. This would 
indicate that energy-constrained systems must develop energy-efficient processing techniques that minimize power 
requirements across all levels of the protocol stack and also minimize message passing for network control, as 
these entail significant transmitter and receiver energy costs. Sleep modes for nodes must be similarly optimized, 
since these modes conserve standby energy but may entail energy costs at other protocol layers due to associated 
complications in access and routing. The hardware and operating system design in the node can also be optimized 
to conserve energy: techniques for this optimization arc described in [100, 102], In fact, energy constraints impact 
all layers of the protocol stack, and hence make cross-layer design even more important for energy-constrained 
networks to meet their performance requirements [103, 104, 105, 9], In this section we describe some of the 
dominant design considerations for ad hoc wireless networks with energy-constrained nodes. 

16.6.1 Modulation and Coding 

Modulation and coding choices are typically made based on tradeoffs between required transmit power, data rate, 
BER, and complexity. However, the power consumed within the analog and DSP circuitry can be comparable to the 
required transmit power for short-range applications. In this case design choices should be based on the total energy 
consumption, including both the transmit and circuit energy consumption. Modeling circuit energy consumption 
is quite challenging, and depends very much on the exact hardware used [106]. This makes it difficult to make 
broad generalizations regarding tradeoffs between circuit and transmit energy. However, the tradeoffs certainly 
exist, especially for short-range applications where transmit energy can be quite low. 

Since circuit energy consumption increases with transmission time, minimizing transmission time and putting 
nodes to sleep can incur significant energy savings. In [96] these ideas were investigated, and it was shown 
that M - ary modulation may enable energy saving over binary modulation for some short-range applications by 
decreasing the transmission time and shutting down most of the circuitry after transmission. In [107] this approach 
was analyzed for MQAM modulation, and optimal strategies to minimize the total energy consumption developed. 
These ideas were extended in [108] to jointly optimize modulation bandwidth, transmission time, and constellation 
size for MQAM and MFSK in both AWGN and Rayleigh fading channels. These results indicate that energy 
consumption is significantly reduced by optimizing transmission time relative to transmission distance: at large 
distances transmit power dominates, so smaller constellations with larger transmission times arc best, but the 
opposite is true at small transmission distances. As a result, MQAM was slightly more energy efficient than 
MFSK at short distances since it could transmit over a shorter time duration, but at larger distances MFSK was 
better due to the superior energy characteristics of nonlinear amplifiers. 

Energy constraints also change the tradeoffs inherent to coding. Coding typically reduces the required trans- 
mit energy per bit for a given BER target. However, this savings comes at a cost of the processing energy associated 
with the encoder and decoder. Moreover, some coding schemes, such as block and convolutional codes, encode bits 
into a codeword that is longer than the original bit sequence: this is sometimes referred to as bandwidth expansion. 
While the total transmit energy required for the codeword to achieve a given BER may be less than that required for 
the uncoded bits, it takes longer for the codeword to be sent, and a longer transmission time consumes more circuit 
energy. Joint modulation and coding techniques such as trellis coding do not have bandwidth expansion and there- 
fore do not incur a bandwidth expansion energy penalty. However, their encoder and decoder processing energy 
must still be taken into account to determine if they yield a net energy savings. The impact of energy constraints on 
coded MQAM and MFSK was studied in [108]. These results indicate that trellis-coded MQAM provides energy 
savings at almost all transmission distances of interest (above 1 m for the hardware parameters considered). How- 
ever, coding techniques for MFSK are not generally bandwidth efficient, so coding is only beneficial for MFSK at 
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moderate transmissions distances (above 30 meters for the hardware parameters considered). 

16.6.2 MIMO and Cooperative MIMO 

MIMO techniques can significantly enhance performance of wireless systems through multiplexing or diversity 
gain. For a given transmit energy per bit, multiplexing gain provides a higher data rate, and diversity gain provides 
a lower BER in fading. However, MIMO systems entail significantly more circuit energy consumption than their 
single-antenna counterparts, due to the fact that separate circuitry is required for each antenna signal path, and the 
signal processing associated with MIMO can be highly complex. Thus, it is not clear if MIMO techniques result 
in performance gain under energy constraints. This question was investigated in [109], where it was found that 
MIMO does provide energy savings over a single antenna system for most transmission distances of interest if the 
constellation size is optimized relative to the distance. The reason is that the MIMO system can support a higher 
data rate for a given energy per bit, so it transmits the bits quicker and can then be shut down to save energy. 

Many energy-constrained networks consist of very small nodes that cannot support multiple antennas. In this 
case nodes that arc close together can exchange information to form a multiple- antenna transmitter, and nodes 
close together on the receive end can cooperate to form a multiple antenna receiver, as shown in Figure 16.5. 
As long as the distance between the cooperating nodes is small, the energy associated with their exchange of 
information is small relative to the energy savings associated with the resulting MIMO system. The energy savings 
of cooperative MIMO was quantified in [109], where it was shown to provide energy savings when the trans mi t 
and receive clusters arc 10 to 20 times the distance separating the cooperating nodes. When there is less of a 
difference between the separation of the cooperating nodes and the transmission distance, the energy cost required 
for the local exchange of information exceeds the energy benefits of cooperating. Cooperative MIMO is one form 
of cooperative diversity. Others were discussed in Section 16.3.3, and these other techniques may provide energy 
savings comparable or exceeding that of cooperative MIMO, depending on the network topology. 




16.6.3 Access, Routing, and Sleeping 

Random access schemes can be made more energy efficient by minimizing collisions and the resulting retrans- 
missions, as well as optimizing transmit power to the minimum required for successful transmission. One way 
to reduce collisions is to increase error protection as collisions become more frequent [110]. Alternatively, adap- 
tively minimizing power through probing as part of the random access protocol has been shown to significantly 
increase energy efficiency [110, 55]. Another method for energy-efficient access is to formulate the distributed 
access problem using a game theoretic approach, where energy and delay are costs associated with the game [112], 
Several different approaches to energy-efficient access were evaluated in [111]. However, no clear winner emerged. 
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since performance of each protocol was highly dependent on the channel characteristics. Delay and fairness con- 
straints can also be incorporated into an energy-efficient access framework, as investigated in [1 13]. Many of these 
techniques avoid collisions through a version of TDMA, although setting up channelized access under distributed 
control can lead to large delays. 

If users have long strings of packets or continuous stream data, then random access works poorly as most 
transmissions result in collisions. Thus channels must be assigned to users in a more systematic fashion by trans- 
mission scheduling. Energy constraints add a new wrinkle to scheduling optimization. From [98], the energy 
required to send a bit is minimized by transmitting it over all available bandwidth and time dimensions. However, 
when multiple users wish to access the channel, the system time and bandwidth resources must be shared among 
all users. Recent work has investigated optimal scheduling algorithms to minimize transmit energy for multiple 
users sharing a channel [114]. In this work scheduling was optimized to minimize the transmission energy required 
by each user subject to a deadline or delay constraint. The energy minimization was based on judiciously varying 
packet transmission time (and corresponding energy consumption) to meet the delay constraints of the data. This 
scheme was shown to be significantly more energy efficient than a deterministic schedule with the same deadline 
constraint. 

Energy-constrained networks also require routing protocols that optimize routes relative to energy consump- 
tion. If the rate of energy consumption is not evenly distributed across all nodes, some nodes may expire sooner 
than others, leading to a partitioning of the network. Routing can be optimized to minimize end-to-end energy 
consumption by applying the standard optimization procedure described in Section 16.3.3, but using energy per 
hop instead of congestion or delay as the hop cost [115]. Alternatively, the routes can be computed based on costs 
associated with the batteries in each node, for example the max-min battery lifetime across all nodes in the net- 
work [115, 116]. Different cost functions to optimize energy-constrained routing were evaluated via simulation in 
[115] and were all roughly equivalent. The cost function can also be extended to include the traditional metric of 
delay along with energy [117]. This method allows the route optimization to trade off between delay and energy 
consumption through different weighting of their respective contribution to the overall cost function. Note that 
computation and dissemination of routing tables can entail significant cost: this can be avoided by routing traffic 
geographically, i.e. in the general direction of its destination, which requires little advanced computation [119]. 

Energy-constrained nodes consume significant power even in standby mode, where they are just passive par- 
ticipants in the network with minimal exchange of data to maintain their network status. The paging industry 
developed a solution to this problem several decades ago by scheduling “sleep” periods for pagers. The basic idea 
is that each pager need only listen for transmissions during certain short periods of time. This is a simple solution 
to implement when a central controller is available. It is less obvious how to implement such strategies within 
the framework of distributed network control. Sleep decisions must take into account network connectivity, so it 
follows that these decisions arc local, but not autonomous. Mechanisms that support such decisions can be based 
on neighbor discovery coupled with some means for ordering decisions within the neighborhood. In a given area, 
the opportunity to sleep should be circulated among the nodes, ensuring that connectivity is not lost through the 
coincidence of several identical decisions to go to sleep. 

16.6.4 Cross-Layer Design under Energy Constraints 

The unique attributes of energy-constrained networks make them prime candidates for cross-layer design. If node 
batteries cannot be recharged, then each node can only transmit a finite number of bits before it dies, after which 
time it is no longer available to perform its intended function (e.g. sensing) or to participate in network activities 
such as routing. Thus, energy must be used judiciously across all layers of the protocol stack to prolong network 
lifetime and meet the requirements of the application. 

Energy-efficiency at all layers of the protocol stack typically imposes tradeoffs between energy consumption, 
delay, and throughput [118]. However, at any given layer, the optimal operating point on this tradeoff curve must 



520 




be driven by considerations at higher layers. For example, if a node transmits slowly it conserves transmit energy, 
but this complicates access for other nodes and increases end-to-end delay. A routing protocol may use a centrally- 
located node for energy-efficient routing, but this will increase congestion and delay on that route, as well as burn 
up that node’s battery power quickly, thereby removing it from the network. Ultimately the tradeoffs between 
energy, delay, throughput, and node/network lifetime must be optimized relative to the application requirements. 
An emergency rescue operation needs information quickly, but typically the network need only last a few hours or 
days. In contrast, a sensor network embedded into the concrete of a bridge to measure stress and strain must last 
decades, but the information can be collected every day or week. 



16.6.5 Capacity per Unit Energy 

When transmit energy is constrained it is not possible to transmit any number of bits with asymptotically small error 
probability. This is easy to see intuitively by considering the transmission of a single bit. The only way to ensure 
that two different values in signal space, representing the two possible bit values, can be decoded with arbitrarily 
small error is to make their separation arbitrarily large, which requires arbitrarily large energy. Since arbitrarily 
small error probability is not possible under a hard energy constraint, a different notion of reliable communication 
is needed for energy-constrained nodes. 

A capacity definition for reliable communication under energy constraints was proposed in [120, 98] as the 
maximum number of bits per unit energy that can be transmitted so that the probability of error goes to zero with 
energy. This new notion of capacity per unit energy requires both energy and blocklength to grow asymptotically 
large for asympotically small error probability. Thus, a finite energy system transmitting at capacity per unit 
energy does not have an asymptotically small error probability. Insight into this definition for AW GN channels 
is obtained by examining the minimum energy per bit required to trans mi t at the normalized Shannon capacity 
Cb = C/B bps/Hz [98]. Specifically, the received energy per bit equals the ratio of received power to data rate: 
E), = P/R = P/C for transmission at rates approaching the Shannon capacity C. Using this expression in the 
Shannon capacity formula for an AWGN channel yields 

C = Bl ° g2 0 + A ^b) = Bl ° g2 i 1 + §§) ' (16 ‘ 7) 



Inverting (16.7) yields the energy per bit required to transmit at rates approaching the normalized capacity 
C/B as 



Eb_ 

N 0 



(C B ) 



2 Cb - 1 
Cb 



C b = 



(16.8) 



As the channel bandwidth B increases, Cb goes to zero, yielding the minimum energy per bit in the wideband 
limit as 
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It was also shown in [98] that a form of on-off signaling such as pulse position modulation achieves this minimum 
energy per bit in AWGN. Moreover, results in [122, 123] indicate that the minimum energy per bit for reliable 
communication in flat fading is also given by (16.9), even when the fading is unknown at the receiver. These 
results indicate that in the limit of infinite bandwidth, minimum energy per bit is not affected by fading or receiver 
knowledge and that on-off signaling is near optimal for minimum energy communication. 

Many energy-constrained wireless systems have large but finite bandwidth, and the results obtained for the 
limiting case of infinite bandwidth can be misleading for designing such systems, as explored in [123]. In particular, 
the bandwidth required to operate at an energy per bit close to the minimum (16.9) is very sensitive to the fading 
distribution and what is known about the fading at the receiver. If fading is known at the receiver then coherent 
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QPSK is the optimal signaling scheme with on-off signaling distinctly suboptimal, but an asymptotic form of 
on-off signaling is optimal without this receiver knowledge. 

The capacity per unit energy for single user channels has been extended to broadcast and multiple access 
channels in [120, 124, 125], These results indicate that in the wideband limit, TDMA is optimal for both channels 
in that it achieves the minimum energy per bit required for reliable communication. However, in the large but 
finite bandwidth regime, it was shown in [124] that superposition strategies such as CDMA coupled with multiuser 
detection achieve reliable communication with the same minimum energy per bit and less bandwidth than TDMA. 
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Chapter 16 Problems 



1. Consider a signal that must be transmitted 1 Km. Suppose the path loss follows the simplified model P r = 

P t d~\ 

(a) Find the required transmit power Ft such that the received power is 10 mW for 7 = 2 and 7 = 4. 

(b) Suppose now that a relay node halfway between the transmitter and receiver is used for multihop 
routing. For 7 = 2 and 7 = 4, if both the transmitter and relay transmit at power P t , how big must P t 
be such that the relay receives a signal of 10 mW from the transmitter, and the destination receives a 
signal of 10 mW from the relay. What is the total power used in the networks, i.e. the sum of powers 
at both the transmitter and the relay? 

(c) Derive a general formula for the total power used in the network with N relays, evenly spaced between 
the transmitter and receiver, such that each relay and the receiver receive 10 mW of power. 



2 . 



Consider an ad hoc wireless network with three users. Users 1 and 2 require a received SINR of 7 dB 
whereas User 3 requires an SINR of 10 dB. Assume all receivers have the same noise power m = 1 and 
there is no processing gain to reduce interference p = 1. Assume a matrix of gain values indexed by the user 
numbers) of 



G 



1 .06 .04 

.09 .9 .126 

.064 .024 .8 



(a) Confirm that the vector equation (/ — F)P > u is equivalent to the SIR constraints of each user. 

(b) Show that a feasible power vector exists for this system such that all users achieve their desired SINRs. 

(c) Find the optimal power vector P* such that users achieve their desired SINRs with minimum power. 



3. This problem uses the same setup as in the prior problem. Suppose each user starts out with power 50, 
so the initial power vector P(0) = (Pi(0), P2(0), ^3(0)) = (50, 50, 50). Following the recursive formula 
(16.4) for each user, plot Pi{k) for each of the users (i = 1,2, 3) over the range k = 1, . . . , N, where N 
is sufficiently large so that the power vector P(N) is close to its optimal value P* . Also plot the SINRs of 
each user for k = 1, ... ,N. 



4. Assume an infinite grid of network nodes spaced every 10 meters along a straight line. Assume the simplified 
path loss model P r = P t d _7 and that P r must be at least 10 mW to establish communication between two 
nodes. 



(a) For 7 = 2, find P max such that every node has a neighborhood of N = 2 other nodes. What happens 
if nodes have a peak power constraint less than this P mai ? 

(b) Find P m . ax for 7 = 2 and N = 4. 

(c) Find Pmax for 7 = 4 and N = 4. 

5. Consider a geographical region of 100 square meters. Suppose N nodes arc randomly distributed in this 
region according to a uniform distribution, and that each node has transmit power sufficient to communicate 
with any node that is within a distance of R meters. Compute the average number of nodes E[N ] as a 
function of radius R, 1 < R < 20, required for the network to be fully connected. The average E[N] should 
be computed based on 100 samples of the random node placement within the region. 
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6. Consider a network with multipath routing, so that every packet is sent over N independent routes from 
its source to its destination. Suppose delay d along each of the multiple end-to-end routes is exponentially 
distributed with mean D : p(d) = e^ d/ D / 1), d > 0. Find the probability that all N copies of the packet 
arrive at the destination with a delay exceeding D as a function of N, and evaluate for N = 1 and N = 5. 
Also determine how throughput is affected by multipath routing as a function of N . Qualitatively, describe 
the throughput/delay tradeoff associated with multipath routing. 

7. Show that (16.6) is convex in both Fp and C l:) . 

8. Assume a link with capacity C = 10 Mbps. Plot the delay given by (16.5) and by (16.6) for a data flow 
ranging from 0 to 10 Mbps: 0 < F, /} < 10 Mbps. 

9. In this problem we consider the difference in the two delay metrics (16.5) and (16.6). Let A be the ratio of 
(16.5) over (16.6). 

(a) Find A as a function of F tJ / C l3 . 

(b) Using the fact that the flow F, :j must be less than C l3 , find the range of values that A can take. 

(c) Find the value of Fij/Cij such that A > 10, i.e. where the delay associated with metric (16.5) exceeds 
that of (16.6) by an order of magnitude. 

(d) Consider a network design where route costs arc computed based on either metric (16.5) or (16.6). For 
which metric will the links be more congested and why? 

10. This problem shows the gains from cross-layer design between the network layer and the application layer. 
For transmission of applications like video, the application layer will try to use a high-rate encoding scheme 
to improve the quality. Now consider that the application is real-time. Given a capacity assignment made by 
the network layer, if the rate of transmission is high, there will be congestion on the link that will delay the 
packets beyond deadline due to which a lot of packets will not reach the decoder in time leading to poorer 
quality. Thus we see a tradeoff. A very simple model of distortion capturing both these effects can be given 
as 

0 (C-R)T 

Dist(i?) = Do + — — + «e (16.10) 

K — /to 

The first two terms correspond to distortion at the application layer due to source encoding and the last one 
corresponds to distortion due to delayed packets. Let D o = .38, Rq = 18.3 Kbps, 6 = 2537, scaling factor n 
= 1, effective packet length L = 3040 bits, play-out deadline T = 350 ms. Link capacity C and transmission 
rate R are described below. 

(a) If the capacity C for the link takes on values 45 Kbps, 24 Kbps, 60 Kbps with probabilities .5, .25 and 
.25 respectively. Find the optimal rate R chosen such that average distortion Dist( R) is minimized. 
Assume full-cooperation between the application layer and the network layer in the sense that the 
application layer always knows the instantaneous capacity allocations made at the network layer. 

(b) Now consider the case when there is no cross-layer optimization and the application layer encodes 
at a fixed rate R = 22 Kbps at all times. Find the average distortion Dist( II) for the same capacity 
distribution as given in paid (a). 

(c) Comparing parts (a) and (b), find the % increase in distortion when cross-layer optimization is not 
done. 
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11. Show that the ( E^/Nq ) versus C B = 
with B for C fixed. Also show that 



C/B, as given by (16.8), increases with C for B fixed and increases 

lim (C B ) = In 2. 

C b ^o \N 0 J 
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Appendix A 

Representation of Bandpass Signals and 
Channels 



Many signals in communication systems arc real bandpass signals with a frequency response that occupies a narrow 
bandwidth 2 B centered around a carrier frequency f c with 2 B « f c , as illustrated in Figure A.l. Since bandpass 
signals are real, their frequency response has conjugate symmetry, i.e. a bandpass signal s(t) has ,S'(/) = \S(— f)\ 
and ZS(f) = —ZS(—f). However, bandpass signals arc not necessarily conjugate symmetric within the signal 
bandwidth about the carrier frequency f c , i.e. we may have |<S(/ C +/)| / \S(f c — f)\ or ZS(f c +f) / — ZS(f c —f ) 
for some f < B. This asymmetry in \S(f) is illustrated in Figure A.l. Bandpass signals result from modula- 
tion of a baseband signal by a carrier, or from filtering a deterministic or random signal with a bandpass filter. 
The bandwidth 2 B of a bandpass signal is roughly equal to the range of frequencies around f c where the signal 
has nonnegligible amplitude. Bandpass signals arc commonly used to model transmitted and received signals in 
communication systems. These arc real signals since the transmitter circuitry can only generate real sinusoids (not 
complex exponentials) and the channel just introduces an amplitude and phase change at each frequency of the real 
transmitted signal. 




Figure A.l: Bandpass Signal £(/). 

We begin by representing a bandpass signal s(t) at carrier frequency f c in the following form: 

s(t) = s/(f) cos(27r/ c £) - SQ(t) sin(27r/ c f), (A.l) 

where s/(t) and sq(t) are real lowpass (baseband) signals of bandwidth B « f c . This is a common representa- 
tion for bandpass signals or noise. In fact, modulations such as MPSK and MQAM arc commonly described using 
this representation. We call sj(t) the in-phase component of s(t) and SQ(t) the quadrature component of s(t). 
Define the complex signal u(t) = sj(t ) + jsg(t), so sj(t) = K{n(t)} and -SQ(t) = 9?{w(i)}. Then u(t) is a 
complex lowpass signal of bandwidth B. With this definition we see that 

s(t) = K {n(f)} cos(27r f c t) — 9 {n(f)} sin(27r/ c f) = K ^u{t)e? 2 ^^ ct j . (A.2) 
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The representation on the right hand side of this equation is called the complex lowpass representation of the 
bandpass signal s(t), and the baseband signal u(t) is called the equivalent lowpass signal for s(t) or its complex 
envelope. Note that U (/) is only conjugate symmetric about / = 0 if u(t) is real, i.e. if SQ(t) = 0. 

Using properties of the Fourier transform we can show that 

S(f) = .5 [U(f - f c ) + U*(-f - /e)]. (A. 3) 

Since s(t) is real, S(f) is symmetric about / = 0. Flowever, the lowpass signals U(f) and U*(f) are not 
necessarily symmetric about / = 0, which leads to an asymmetry of S(f) within the bandwidth IB about the 
carrier frequency f c , as shown in Figure A. 1. In fact, S(f) is only symmetric about the carrier frequency within 
this bandwidth if u(t) = si(t), i.e. if there is no quadrature component in u(t). We will see shortly that this 
asymmetry affects the response of bandpass channels to bandpass signals. 

An alternate representation of the equivalent lowpass signal is 

u(t) = a(t)e^^\ (A.4) 

with envelope 

a(t) = sjsjit) + s 2 Q {t), 

and phase 

m = “““ 1 ( Si ) 

With this representation 

s(t) = 3? ja(t)e j ^e j27r ^ ct j = aft) cos(2-7r f c t + </>(£)). (A.7) 

Let us now consider a real channel impulse response hit) with Fourier transform H(f). If hit) is real then 
H*(—f) = H(f). In communication systems we arc mainly interested in the channel frequency response H(f) 
for |/ — f c \ < B, since only these frequency components of H(f) affect the received signal within the bandwidth 
of interest. A bandpass channel is si mi lar to a bandpass signal: it has a real impulse response hit) with frequency 
response H(f) centered at a carrier frequency f c with a bandwidth of 2 B « f c . To capture the frequency 
response of H(f) around f c , we develop an equivalent lowpass channel model similar to the equivalent lowpass 
signal model as follows. Since the impulse response h(t) corresponding to H(f) is a bandpass signal, it can be 
written using an equivalent lowpass representation as 

h(t) = 2 K {/q(t)e i2?r/ct } , (A. 8) 

where the extra factor of 2 is to avoid constant factors in the H ( / ) representation given by (A.9). We call h[(t) the 
equivalent lowpass channel impulse response for H(f). From (A.2)-(A.3), the representation (A.8) implies that 

H(f) = H t (f - f c ) + H*(—f - f c ), (A.9) 

so H(f) consists of two components: Hff) shifted up by f c , and (/) shifted down by f c . Note that if 
H(f) is conjugate symmetric about the carrier frequency f c within the bandwidth IB then hi(t) will be real 
and its frequency response Hff) conjugate symmetric about zero. However, in many wireless channels, such as 
frequency-selective fading channels, H(f) is not conjugate symmetric about f c , in which case hi(t) is complex 
with in-phase component h[j(t) = $t{hi(t)} and quadrature component h^Q{t) = ^{hft)}. Note that if hi(t) is 
complex then Hff) is not conjugate symmetric about zero. 



(A.5) 
(A. 6) 
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We now use equivalent lowpass signal and channel models to study the output of a bandpass channel with 
a bandpass signal input. Let s(t) denote the input signal with equivalent lowpass signal u(t). Let h(t) denote 
the bandpass channel impulse response with equivalent lowpass channel impulse response hi(t). The transmitted 
signal s(t) and channel impulse response hit) are both real, so the channel output r(t) = s{t) * h(t) is also real, 
with frequency response R(f) = H(f)S(f). Since S(f) is abandpass signal, R(f) will also be abandpass signal. 
Therefore, it has a complex lowpass representation of 

r(t) = 3? |u(i)e 5 " 2,r ^ ot | . (A. 10) 

We now consider the relationship between the equivalent lowpass signals corresponding to the channel input 
s(t), channel impulse response hit), and channel output r(t). We can express the frequency response of the 
channel output as 



R(f) = H(f)S(f) = .5[Hl(f - f c ) + H*(—f - f c )][U(f - f c ) + U*(—f - f c )}. (A. 11) 



For bandpass signals and channels where the bandwidth B of u(t) and hi{t) is much less than the carrier frequency 
fc, we have 



and 

m-f-f c )u(f-f c ) = o. 



Thus, 



R(f) = . 5 [H t (f - f c )U(f - f c ) + H * (— / - f c )U*(—f - fc)}. 



From (A.2)-(A.3), (A. 10) implies that 



(A. 12) 



R(f) = .5[V(f-f c ) + V*(-f-f c )}. 



(A. 13) 



Equating terms at positive and negative frequencies in (A. 12) and (A. 13), we get that 



V(f - f c ) = H t (f - f c )U(f - f c ) 



(A. 14) 



and 



v*(-f - f c ) = m-f - f c )u*(-f - f c ) 



or, equivalently, that 



V(f) = H l (f)U(f). 

Taking the inverse Fourier transform yields that 

v(t) = u(t) * hi(t). 



(A. 15) 
(A. 16) 

(A. 17) 



Thus, we can obtain the equivalent lowpass signal v(t) for the received signal r(t) by taking the convolution of 
hi(t) and u(t). The received signal is therefore given by 

r(t) = 3? |(u(t) * hi{t))e> 2nf ^ . (A. 18) 

Note that V(f) = Hi(f)U(f) is conjugate symmetric about / = 0 only if both U(f) and Hi(f) arc. In other 
words, the equivalent lowpass received signal will be complex with nonzero in-phase and quadrature components 
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if either u(t) or hi(t) is complex. Moreover, if u(t) = sj(t) is real (no quadrature component) but the channel 
impulse response hi(t) = hij(t) + jhiq(t) is complex (e.g. as with frequency-selective fading) then 

v{t) = s/(f) * (hi :I (t) + jh UQ (t)) = s 7 (f) * hij(t) + * \ Q {t) (A. 19) 

is complex, so the received signal will have both an in-phase and a quadrature component. More generally, if 

u(t) = s^t) + jsQ(t) and ln(t) = hij(t ) +jhi tQ (t) then 

v(t) = [s/(i)+JSQ(f)]*[^ i /(f)+jh i ,Q(f)] = [si(t)*hi ;I (t)-SQ(t)*hi ! Q(t)]+j[si(t)*hi j Q(t) + SQ(t)*hij(t)]. 

(A. 20) 

So the in-phase component of v(t) depends on both the in-phase and quadrature components of u(t), and similarly 
for the quadrature component of v(t). This creates problems in signal detection, since it causes the in-phase and 
quadrature parts of a modulated signal to interfere with each other in the demodulator. 

The main purpose for the equivalent lowpass representations is to analyze bandpass communication systems 
using the equivalent lowpass models for the transmitted signal, channel impulse response, and received signal. This 
removes the carrier terms from the analysis, in particular the dependency of the analysis on the carrier frequency 
fc 
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Appendix B 

Probability Theory, Random Variables, and 
Random Processes 

B.l Probability Theory 

Probability theory provides a mathematical characterization for random events. Such events arc defined on an 
underlying probability space (17, 8, p(-)). The probability space consists of a sample space 17 of possible outcomes 
for random events, a set of random events 8 where each A € 8 is a subset of 17, and a probability measure />(•) 
defined on these subsets. Thus, 8 is a set of sets, and the probability measure p(A) is defined for every set A € 8. 
A probability space requires that the set 8 is a cr-field. Intuitively, a set of sets 8 is a cr-field if it contains all 
intersections, unions, and complements of its elements. 1 More precisely, 8 is a cr-field if the set of all possible 
outcomes 17 is one of the sets in 8, a set A € 8 implies that A c e 8, and for any sets Ai, A 2 , . . . with Ai 6 8, 
we have A,; e 8. The set 8 must be a cr-field for the probability of intersections and unions of random events 
to be defined. We also require that the probability measure associated with a probability space have the following 
three fundamental properties: 

1. p( 17) = 1. 

2. 0 < p(A) < 1 for any event ief. 

3. If A and B arc mutually exclusive, i.e. their intersection is zero, then p(A LJ B) = p(A) + p(B). 

Throughout this section, we only consider sets in 8, since the probability measure is only defined on these sets. 

Several important characteristics of the probability measure p(-) can be derived from its fundamental prop- 
erties. In particular, p(A c ) = 1 — p(A). Moreover, consider sets A\, . . . , A n , where Ai and Aj, i / j, 
are disjoint (Ai n Aj = 0). Then if A\ U A 2 U . . . U A n = 17, we have that Y^i=\P( A i) = 1. We call 
the set { ,4 1 .... , A n } with these properties a partition of 17. For two sets Ai and Aj that are not disjoint, 
p(AiUAj) = p(Ai) +p(Aj) —p(AiC\Aj). This leads to the union bound , which says that for any sets A 1 , . . . , A n , 

n 

p(A 1 U A 2 U . . . U A n ) < P( A i ) • (B- 1) 

i= 1 

1 We use the notation An B to denote the intersection of A and B, i.e. all elements in both A and B. The union of A and B, denoted 
dUBis the set of all elements in A or B. The complement of a set A C 17, denoted by A c , is defined as all elements in 17 that are not in 
the set A. 
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The occurence of one random event can affect the probability of another random event, since observing one 
random event indicates which subsets in £ could have contributed to the observed outcome. To capture this effect, 
we define the probability of event B conditioned on the occurence of event A as p(B\A) = p(A D B)/p(A), 
assuming p(A) 0. This implies that 

p(A n B) = p(A\B)p(B) = p(B\A)p(A). (B.2) 



The conditional probability p(B\A) = p(A D B) /p(A) essentially normalizes the probability of B with respect to 
the outcomes associated with A, since it is known that A has occured. We obtain Bayes Rule from (B.2) as 



p(B\A) 



P(A\B)p(B) 

p{A) 



(B-3) 



Independence of events is a function of the probability measure pf). In particular, events A and B arc independent 
ifp(AnB) = p(A)p(B). This implies that p(B\A) = p(B) and p(A\B) =p(A). 



B.2 Random Variables 



Random variables are defined on an underlying probability space (£l,£,p(-)). In particular, a random variable 
X is a function mapping from the sample space O to a subset of the real line. If X takes discrete values on the 
real line it is called a discrete random variable, and if it takes continuous values it is called a continuous random 
variable. The cumulative distribution function (CDF) of a random variable X is defined as Px(x) = p(X < x) 
for some x £ 1Z. The CDF is derived from the underlying probability space as p(X < x) = p(X -1 (— oo, x)), 
where X _1 (-) is the inverse mapping from the real line to a subset of ll: X~ 1 (— 00 , x) = {uj £ D : X(u>) < x}. 
Properties of the CDF arc based on properties of the underlying probability measure. In particular, the CDF satisfies 
0 < Px(x) = p(X~ 1 (— oo, x)) < 1. In addition, the CDF is nondecreasing: Px(x i) < Px(x 2 ) for x\ < x 2 . That 
is because Px{x 2 ) =p(X~ 1 (- oo,x 2 )) = p(3f _1 (-oo, xi)) +p(X~ 1 (x 1 , x 2 )) > p(X~ 1 (-oo, xi)) = Px(x t). 

The probability density function (pdf) of a random variable X is defined as the derivative of its CDF, p x(x) — 
£Px(x). For X a continuous random variable px{x) is a function over the entire real line. For X a discrete 
random variable px(x) is a set of delta functions at the possible values of X. The pdf, also refered to as the 
probability distribution or distribution of X, defines the probability that X lies in a given range of values: 

rx 2 

p(x i < X < x 2 ) = p(X < x 2 ) -p(X < xi) = P x (x 2 ) - P x (xi) = / p x (x)dx. (B.4) 

J X± 



Since Px( 00 ) = 1 and Px(-oo) = 0, the pdf integrates to 1, 

ROD 

/ px{x)dx = 1. 



(B.5) 



Note that the subscript X is often omitted from the pdf and CDF when it is clear from the context that these 
functions characterize the distribution of X. In this case the pdf is written as p{x) and the CDF as P(x). 

The mean or expected value of a random variable X is its probabalistic average, defined as 



px = E[X] = 



xpx(x)dx. 



(B.6) 



The expectation operator E[-] is linear and can also be applied to functions of random variables. In particular, the 
mean of a function of X is given by 



/ OO 
-OO 



g(x)p x {x)dx. 



(B.7) 
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A function of particular interest is the nth moment of X, 




x n px(x)dx. 



The variance of X is defined in terms of its mean and second moment as 

Var[X] = 4- = E[(X - y x ) 2 } = E[X 2 ] - p 2 x . 



(B-8) 



(B.9) 



The variance characterizes the average squared difference between X and its mean fix- The standard deviation 
of X, ox, is the square root of its variance. From the linearity of the expectation operator, it is easily shown that 
for any constant c, E[cX] = cE[X], Var[cX] = c 2 Var[X]’ E[A + c] = E[X] + c, and Var[X + c] = Var[AT]- 
Thus, scaling a random variable by a constant scales its mean by the same constant and its variance by the constant 
squared. Adding a constant to a random variable shifts the mean by the same constant but doesn’t affect the 
variance. 

The distribution of a random variable X can be determined from its characteristic function, defined as 

a r°° 

<j>x{y) = E[e? vX }= px{x)e jvx dx. (B.10) 

J — OO 

We see from (B.10) that the characteristic function < f>x{v ) of X(t ) is the inverse Fourier transform of the distribu- 
tion px(x) evaluated at f = v/ (27 t ). Thus we can obtain px(x) from fxty) as 

Px{x) = ^-[ <t>x (n)e~ jux dx. (B.ll) 

J — OO 



This will become significant in finding the distribution for sums of random variables. 

Let AT be a random variable and g(x) be a function on the real line. Let Y = g( X ) define another random vari- 
able. Then Py(y) = f X y( x )<yPx( x )dx. For g monotonically increasing and one-to-one this becomes Py(y) = 

f-oo i!l> Px{x)dx. For g monotonically decreasing and one-to-one this becomes Pyiv) = iy ) px(x)dx. 

We now consider joint random variables. Two random variables must share the same underlying probability 
space for their joint distribution to be defined. Let X and Y be two random variables defined on the same probabil- 
ity space (Cl, S,p(-)). Their joint CDF is defined as Pxy(x , y) = p(X < x, Y < y ). Their joint pdf (distribution) 
is defined as the derivative of the joint CDF: 



Pxy(x, y) 



A 8 2 Pxy(x, y) 
dxdy 



(B.12) 



Thus, 

/ x ry 

/ Pxy(v, iu)dvdw. (B. 13) 

-oo J — OO 

For joint random variables X and Y, we can obtain the distribution of X by integrating the joint distribution with 



respect to Y : 




rcxo 






Px(x) = 


/ pxy{x, y)dy. 
J —oo 


(B. 14) 


Similarly, 




POO 






pr{y) = 


/ Pxy(x, y)dx. 

/ — OO 


(B.15) 
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The distributions px(x) and py(y) obtained in this manner arc sometimes refered to as the marginal distributions 
relative to the joint distribution pxy{x, y )• Note that the joint distribution must integrate to one: 



Pxy{x, y)dxdy = 1 . 



(B-16) 



' — OO J — OO 



The definitions for joint CDF and joint pdf of two random variables extend in a straightforward manner to any 
finite number of random variables. 

As with random events, observing the value for one random variable can affect the probability of another 
random variable. We define the conditional distribution of the random variable Y given a realization X = x of 
random variable X as py(y\X = x) = pxY{x,y) /px{x). This implies that pxy{x,p) = py{y\X = x)px{x). 
Independence between two random variables X and Y is a function of their joint distribution. Specifically, X and 
Y arc independent random variables if their joint distribution pxy(x, y) factors into separate distributions for X 
and Y : 7>xy(x. : /./) = p \ ( x )py( y ) . For independent random variables, it is easily shown that for any functions 
f(x) and g(y), E [f(X)g(Y)] = E[f(X)]E[g(Y)]. 

For X and Y joint random variables with joint pdf pxy(x- y), we define their ijlh joint moment as 



Epcr J '] = 



/ oo r 
-oo J — oo 



x l y j p X Y(x,y)dxdy. 



(B. 17) 



A 



The correlation of X and Y is defined as E[XT], The covariance of X and Y is defined as CovpfE] = E[(X — 
Hx)(Y — pyj] = E[XT] — pxpy- Note that the covariance and correlation of X and Y arc equal if either X 
or Y has mean zero. The correlation coejficient of X and Y is defined in terms of their covariance and standard 

deviations as p = Co\[XY]/ [ax oy). We say that X and Y arc uncorrelated if their covariance is zero or, 
equivalently, their correlation coefficient is zero. Note that uncorrelated random variables (i.e. X and Y with 
Cov[XY] = E[JT] — pxpY = 0) will have a nonzero correlation (E[JT] / 0) if their means arc not zero. 
For random variables X \, . . . , X n , we define their covariance matrix Y as an n x v matrix with ijth element 
Yij = Cov[ X{Xj\. In particular, the ith diagonal element of Y is the variance of Xp. Y lt = VarfAj. 

Consider two independent random variables X and Y . Let Z = X + Y define a new random variable on the 
probability space (fi, T. />(•)). We can show directly or by using characteristic functions that the distribution of Z 
is the convolution of the distributions of X and Y : pz(z) = Px(x) * py (y)- Equivalently, = <Px( ij )(Py(y). 

With this distribution it can be shown that E \Z] = E[X] + E[E], and Var[Z] = Var[X] + Var[F], So for sums of 
independent random variables, the mean of the sum is the sum of the means and the variance of the sum is the sum 
of the variances. 

A distribution that arises frequently in the study of communication systems is the Gaussian distribution. The 
Gaussian distribution for a random variable X is defined in terms of its mean px and variance o\ as 



Px(x) = 



1 



\/2ttox 



=-[(z-Mx) 2 /(2o-x)] 



(B.18) 



The Gaussian distribution, also called the normal distribution, is denoted as Af(px , a x)- Note that the tail of the 
distribution, i.e. the value of px(x) as x moves away from px, decreases exponentially. The CDF P x{x) = 
p(X < x) for this distribution does not exist in closed form. It is defined in terms of the Gaussian Q function as 



Px(x) = p(X <x) = l — Q 



X - Px 
Ox 



where the Gaussian Q function, defined by 



Q{x) = [ ~}=e y2/2 dy, 

J x v 



(B. 19) 



(B.20) 
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is the probability that a Gaussian random variable X with mean zero and variance one is bigger than x: Q(x) = 
p(X > x ) for X ~ jV(0, 1). The Gaussian Q function is related to the complementary error function as Q(x) = 
.5erfc(x/\/2). These functions are typically calculated using standard computer math packages. 

Let X = (Xi , . . . , X n ) denote a vector of jointly Gaussian random variables. Their joint distribution is given 



by 



Px i...x n (ar, . • • ,x n ) = 



1/ (27r) n det [S] 



exp [— .5 (x-/x x ) T S 1 {x-px)], 



(B.21) 



where fix = E[X] 7 = (E[X 1 ] , . . . ,E[X n ]) 7 is the mean of X and X is the n x n covariance matrix of X, 
i.e. Xj j = Co v[XiXj\. It can be shown from (B.21) that for jointly Gaussian random variables X and Y, if 
Cov[Xy] = 0 then pxy(x , y) = px(x)py(y). In other words, Gaussian random variables that are uncorrelated 
are independent. 

The underlying reason why the Gaussian distribution commonly occurs in communication system models 
is the Central Limit Theorem (CLT), which defines the limiting distribution for the sum of a large number of 
independent random variables with the same distribution. Specifically, let X t be independent and identically 
distributed (i.i.d.) joint random variables. Let Y n = ]C" =1 X, and Z n = (Y n — /j,y n ) / vy n ■ The CLT states that 
the distribution of Z n as n goes to infinity converges to a Gaussian distribution with mean zero and variance one: 
lim n ^ 00 pz n (x) = Af(0, 1). Thus, any random variable equal to the sum of a large number of i.i.d. random 
components has a distribution that is approximately Gaussian. For example, noise in a radio receiver typically 
consists of spurious signals generated by the various hardware components, and with a large number of i.i.d. 
components this noise is accurately modeled as Gauss-distributed. 

Two other common distributions that arise in communication systems are the uniform distribution and the 
binomial distribution. A random variable X that is uniformly distributed has pdf px(x) = 1/(6 — a) for x in 
the interval [a, 6] and zero otherwise. A random phase 6 is commonly modeled as uniformly-distributed on the 
interval [0, 2i r], which we denote as 0 ~ U[ 0, 2i r]. The binomial distribution often arises in coding analysis. Let 
Xi, i = 1, . . . , n, be discrete random variables that take two possible values, 0 and 1. Suppose the X r are i.i.d. 
with p(X{ = 1) = p and j>(X, = 0) = 1 — p. Let Y = X,. Then Y is a discrete random variable that takes 

integer values k = 0,1,2,.... The distribution of Y is the binomial distribution, given by 



where 



p(Y = k)= ^ ^ )/(!-,) 



(B.22) 



n \ A n! 

k J kl(n — k)\ 



(B.23) 



B.3 Random Processes 

A random process X(t) is defined on an underlying probability space (fi, £,p(-)). In particular, it is a function 
mapping from the sample space S2 to a set of real functions {x 1 (t), xXt ). . . .}, where each Xj(t) is a possible real- 
ization of X (t) . Samples of X (t) at times to, 1 1 .... , t n arc joint random variables defined on the underlying prob- 
ability space. Thus, the joint CDF of samples at times to, t±, . . . , t n is given by Px(t 0 )x(t 1 )...x{t ri )( x o, ■ ■ ■ > x n ) = 
p(X(to) < xq, X(ti) < x\, . . . ,X(t n ) < x n ). The random process X(t) is fully characterized by its joint CDF 
EA-(to)x(t 1 )...xp„)( x o, • • ■ ,x n ) for all possible sets of sample times {t 0 , h , . . . , t n }. 

A random process X(t) is stationary if for all T and all sets of sample times {to, • • • ,tn}, we have that 
p(X(t 0 ) < x 0 ,X(ti) < xi, . . .,X{t n ) < x n ) = p{X(t 0 + T) < x 0 ,X(ti + T) < xi, . . .,X(t n + T) < x n ). 
Intuitively, a random process is stationary if time shifts do not affect its probability. Stationarity of a process is 
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often difficult to prove since it requires checking the joint CDF of all possible sets of samples for all possible time 
shifts. Stationarity of a random process is often infered from the stationarity of the source generating the process. 

The mean of a random process is defined as E[X(t)\. Since the mean of a stationary random process is 
independent of time shifts, it must be constant: E[X(f)] = Y[X{t — t)] = E[X(0)] = fix- The autocorrelation 

of a random process is defined as A x {t, t + r) = E [X{t)X(t + r)]. The autocorrelation of X(t) is also called 
its second moment. Since the autocorrelation of a stationary process is independent of time shifts, A x {t, t + r) = 

E[A(t — t)X(t + t — t)] = E[X(0)X(r)] = Ax[t). So for stationary processes, the autocorrelation depends only 
on the time difference r between the samples X (t) and X (t+r) and not on the absolute time t. The autocorrelation 
of a process measures the correlation between samples of the process taken at different times. 

Two random processes X (t ) and Y (t) defined on the same underlying probability space have a joint CDF 
characterized by 

Px(to)X(t 1 )...X(tn)Y(t' 0 )...Y(t' m ){ x O, ■ ■ -,X n ,yo, ■ ■ » 3 !Jrn ) 

p(X(t 0 ) < z 0 , • • • ,x(t n ) < x n ,Y(t' 0 ) < y 0 , ■ ■ ■ ,Y(t' m ) < y m ) (B.24) 

for all possible sets of sample times {to, f i • • • • ■ t n j and {t' 0 , t\, . . . , t' m } . Two random processes X(t) and Y(t) 
are independent if for all such sets we have that 

Px(to)x(ti)...x(t n )Y(t' 0 )...Y(t' m ){X(to) Y xo, • • • , X(t n ) < x n , Y(t 0 ) < yo , . . . , Y(t m ) < y m ) 

= Fx(t 0 )x(t 1 )...x(t n )(^(^o) < x 0 ,. . .,X(t n ) < x n )p Y ( t ' 0 )...Y(t' m ){ Y {to) < 2/o, • • : ,Y(l'm) < y m )( B.25) 

The cross-correlation between two random processes X(t) and Y (t) is defined as Axy(t, t + r) = E[X(t)Y" (t + 
r)]. The two processes arc uncoiTelated if E [X(t)Y(t + r)] = Epf(t)]E[Y(f + r)] for all t and r. As with the 
autocorrelation, if both X (t) and Y(t) are stationary, the cross-correlation is only a function of r: Axy(t,t + r) = 

E [X(t - t)Y(t + T-t)}= E[X(0) Y(r)] = A xy (t). 

In most analysis of random processes we focus only on the first and second moments. Wide-sense stationarity 
is a notion of stationarity that only depends on the first two moments of a process, and it can also be easily 
verified. Specifically, a process is wide-sense stationary (WSS) if its mean is constant, E[X(t)] = nx, and its 
autocorrelation depends only on the time difference of the samples, A x (t, t + r) = EfA' (t)X(t + r)] = A x (t). 
Stationary processes arc WSS but in general WSS processes arc not necessarily stationary. For WSS processes, the 
autocorrelation is a symmetric function of r, since A x (t) = E [X(t)X(t + r)] = E[AT(i + r)X(f)] = A x {— r). 
Moreover, it can be shown that A x (t) takes its maximum value at r = 0, i.e. \A x (t)\ < A_\-(0) = E[Y 2 (f)]. As 
with stationary processes, if two processes X (t) and Y(t ) arc both WSS then their cross-corTelation is independent 
of time shifts, and thus depends only on the time difference of the processes: Axy(t, t + r) = Y\X (0)Y (t)\ = 
Axy{t). 

The power spectral density (PSD) of a WSS process is defined as the Fourier transform of its autocorrelation 
function with respect to r: 

/ OO 

A x [r)e-^ T dT. (B.26) 

-OO 

The autocorrelation can be obtained from the PSD through the inverse transform: 

/ OO 

S x (f)e j2irfr df. (B.27) 

-OO 

The PSD takes its name from the fact that the expected power of a random process X(t) is the integral of its PSD: 

/ OO 

Sx(f)df, (B.28) 

-OO 
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which follows from (B.27). Similarly, from (B.26) we get that S'x(O) = /’ x x Ax(t)cIt. The symmetry of Ax(t) 
can be used with (B.26) to show that Sx(f) is also symmmetric, i.e. Sx(f) = Sx(—f)- White noise is defined as a 
zero mean WSS random process with a PSD that is constant over all frequencies. Thus, a white noise process X (t) 
has E[X(£)] = 0 and Sx(f) = Nq/2 for some constant Nq which is typically refered to as the (one-sided) white 
noise PSD. By the inverse Fourier transform, the autocorrelation of white noise is given by A x (r) = (N 0 /2)6(r). 
In some sense, white noise is the most random of all possible noise processes, since it decorrelates instantaneously. 

Random processes arc often filtered or modulated, and when the process is WSS the impact of these operations 
can be characterized in a simple way. In particular, if a WSS process with PSD Sx(f) is passed through a linear 
time-invariant filter with frequency response H(f), then the filter output is also a WSS process with power spectral 
density \H(f)\ 2 Sx{f)- If a WSS process X(t) with PSD Sx(f ) is multiplied by a carrier cos(27r/ c f + 9) with 9 ~ 
U\ 0, 27 t], the multiplication results in a WSS process X(t) cos(2irf c t + 9) with PSD .25[Sx{f — fc) + Sx{f + fc)]- 

Stationary and WSS arc properties of the underlying probability space associated with a random process. 
We arc also often interested in time-averages associated with random processes, which can be characterized by 
different notions of ergodicity. A random process X(t) is ergodic in the mean if its time-averaged mean, defined 
as 



1 



Px= lim 7^/ X(t)dt, 



T— >oo 2 T 



' —T 



(B.29) 



is constant for all possible realizations of X (t) . In other words, X (t) is ergodic in the mean if lim t^oo y/ - I-t x > ft) dt 
equals the same constant f/x f° r all possible realizations x t (t) of X (t ) . Similarly, a random process X ( t) is ergodic 
in the nth moment if its time-averaged nth moment 



to, 

dx n 



lim 

T^OO 



1 

2 T 




X n (t)dt 



(B.30) 



is constant for all possible realizations of X(t). We can also define ergodicity of X(t) relative to its time-averaged 
autocorrelation 

A x( r ) = lim tL f X{t)X(t + r)dt. (B.31) 

1 — kx ) A 1 J —T 

Specifically, X(t) is ergodic in autocorrelation if lirrin^oo j, f f T T Xi{t)xi{t + r)dt equals the same value A^(t) 
for all possible realizations X jit) of X(t). Ergodicity of the autocorrelation in higher order moments requires that 
the nmth order time-averaged autocorrelation 

Ax(n,m,T)= lim [ X n (t)X m (t + r)dt (B.32) 

T— xx) Z I J —]' 



is constant for all realizations of X(t). A process that is ergodic in all order moments and autocorrelations is called 
ergodic. Ergodicity of a process requires that its time-averaged nth moment and zjth autocorrelation, averaged 
over all time, be constant for all n, i, and j. This implies that the probability associated with an ergodic process is 
independent of time shifts, and thus the process is stationary. In other words, an ergodic process must be stationary. 
However, a stationary process can be either ergodic or nonergodic. Since an ergodic process is stationary. 



to. 

dx 



= 

= E 



1 



tlo 2fJ_ T Xm \ 

= A 'L^fL T Elm]dt 

1 f T 

= lim — / nxdt = [i x - 

1 — xx) z I J —T 



(B.33) 



544 




Thus, the time-averaged mean of X(t) equals its probabilistic mean. Similarly, 

*%{t) = E [A%(t)\ 

= E lim I X(t)X(t + T)dt 

1 — xx) z 1 J —X 

= 1™ ^ [ E[X(t)(t + T)]dt 

1 —xx) z 1 J —X 

= lim ^ [ A x (r)dt = A x {t), (B.34) 

T — xx) z 1 J —X 

so the time-averaged autocorrelation of X(t) equals its probabilistic autocorrelation. 



B.4 Gaussian Processes 



Noise processes in communication systems arc commonly modeled as a Gaussian process. A random process X it) 
is a Gaussian process if for all values of T and all functions g(t) the random variable 



X g = / g(t)X(t)dt 



(B.35) 



has a Gaussian distribution. Since a communication receiver typically uses an integrator in signal detection, this 
definition implies that if the channel introduces a Gaussian noise process at the receiver input, the distribution of 
the random variable associated with the noise at the output of the integrator will have a Gaussian distribution. The 



mean of X g is 



and the variance is 



If X(t) is WSS these simplify to 



E[X g }= / g(t)E[X(t)]dt 
Jo 

r T r T 

Varpy=/ / g(t)g(s)E[X(t)X(s)}dtds-(E[X g X 
Jo Jo 

lify to 

E i x g] = [ g(t)nxdt 
Jo 

Var[X s ]= r [ T g{t)g{s)R x {s-t)dtds-(E[X g }) 2 . 
Jo Jo 



Several important properties of Gaussian random processes can be obtained from the definition. In particular, 
if a Gaussian random process is input to a linear time-invariant filter, the filter output is also a Gaussian random 
process. Moreover, we expect samples X(U),i = 0, 1, ... of a Gaussian random process to be jointly Gaussian 
random variables, and indeed that follows from the definition by setting git) = bit — t j) in (B.35). Since these 
samples arc Gaussian random variables, if the samples arc uncorrelated, they arc also independent. In addition, 
for a WSS Gaussian processes, the distribution of X g in (B.35) only depends on the mean and autocorrelation 
of the process X(t). Finally, note that a random process is completely defined by the joint probability of its 
samples over all sets of sample times. For a Gaussian process, these samples arc jointly Gaussian with their joint 
distribution determined by the mean and autocorrelation of the process. Thus, since the underlying probability of 
a Gaussian process is completely determined by its mean and autocorrelation, there arc no higher moments for the 
process, so a WSS Gaussian process is also stationary Similarly, a Gaussian process that is ergodic in the mean 
and autocorrelation is an ergodic process. 
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Appendix C 

Matrix Definitions, Operations, and Properties 



C.l Matrices and Vectors 



An N x M matrix A is a rectangular array of values with N rows and M columns, written as 



an 



A = 



am 



a\M 

a-NM 



(C.l) 



The ijth element (or entry) of A, i.e. the element in the ith row and jth column, is written as A , 7 . In (C.l) we 
have Ay = a l} . The matrix elements arc also called scalars to indicate that they arc single numbers. An N x M 
matrix is called a square matrix if N = M, a skinny matrix if N > M and a fat matrix if N < M . 

The diagonal elements of a square matrix are the elements along the diagonal line from the top left to the 
bottom right of the matrix, i.e. the elements A tJ with i = j. The trace of a square N x N matrix is the sum of 
its diagonal elements: Tr[A] = JA =1 A zz . A square matrix is called a diagonal matrix if all elements that arc not 
diagonal elements, referred to as the off-diagonal elements, arc zero: A t j = 0, j f i. We denote a diagonal matrix 
with diagonal elements a\, . . . , as diagfai, . . . , The N x N identity matrix Ijv is a diagonal matrix with 
Ijj = 1, i = 1, . . . , N, i.e. Ijv = diag[l, . . . , 1], The subscript N of I at is omitted when the size is clear from the 
context (e.g. from the size requirements for a given operation like matrix multiplication). 

A square matrix A is called upper triangular if all its elements below the diagonal arc zero, i.e. Ay = 0, i > 
j. A lower triangular matrix is a square matrix where all elements above the diagonal arc zero, i.e. A = 0, i < j. 
Diagonal matrices arc both upper triangular and lower triangular. 

Matrices can be formed from entries that arc themselves matrices, as long as the dimensions arc consistent. 
In particular, if B is an N x M\ matrix and C is an N x M -2 matrix then we can form the N x (Mi + M 2 ) matrix 
A = [B C]. The ith row of this matrix is [A*i . . . Aq Ml +M 2 )] = [Bji . . . B iM 1 Cn . . . C im 2 \- The matrix A 
formed in this way is also written as A = [BjC]. If we also have a K x L\ matrix D and a K x L2 matrix E then 
if Mi + M 2 = Li + L 2 we can form the ( N + K ) x (Mi + M 2 ) matrix 



A = 



B C 
D E 



(C.2) 



The matrices B, C, D, and E arc called submatrices of A. A matrix can be composed of any number of submatrices 
as long as the sizes are compatible. A submatrix A' of A can also be obtained by deleting certain rows and/or 
columns of A. 
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A matrix with only one column, i.e., with M = 1, is called a column vector or just a vector. The number of 
rows of a vector is called its dimension. For example, an iV-dimensional vector x is given by 



x\ 



X = 



X N 



(C-3) 



The ith element of vector x is written as x ; . We call an TV-dimensional vector with each element equal to one a 
ones vector and denote it by 1 yv- An AT- dimensional vector with one element equal to one and the rest equal to 
zero is called a unit vector. In particular, the ith unit vector e' has c* = 1 and e* = 0, j / i. A matrix with only 
one row, i.e., with N = 1, is called a row vector. The number of columns in a row vector is called its dimension, 
so an M-dimensional row vector x is given by x = [x \ ... xm] with ith element x, = x r . The Euclidean norm of 
an TV-dimensional row vector or vector, also called its norm, is defined as 



N 




(C.4) 



C.2 Matrix and Vector Operations 

If A is an A r x M matrix, the transpose of A, denoted A 7 , is the M x N matrix defined by Afj = A jp. 



A 1 = 



Oil 


a\M 


T 


an 


ajvi 


am ■ ■ 


• aNM 




aiM ■ ■ 


• aNM 



(C.5) 



In other words. A 7 is obtained by transposing the rows and columns of A, so the ith row of A becomes the ith 
column of A 1 . The transpose of a row vector x = [xi ... xjv] yields a vector with the same elements: 



x 7 = [xi . . .Xat] T = 



X\ 



X N 



(C.6) 



We therefore often write a column vector x with elements x, as x = [xi ... x/y] 7 '. Similarly, the transpose of an 
A-dimensional vector x with ith element x, is the row vector [xi ... x/y]. Note that for x a row vector or vector, 
(x r ) 7 = x. 

The complex conjugate A* of a matrix A is obtained by taking the complex conjugate of each element of A: 



an 


aiu 


* 


°ii ' ' 


a lM 


(INI ■ ■ 


■ aNM 




. a *Nl ' ' 


' a *NM _ 



(C.7) 



The Hermitian of a matrix A, denoted as A 77 , is defined as its conjugate transpose: A H = (A*) T . Note that 
applying the Hermitian operation twice results in the original matrix: (A H ) H = A, so A is the Hermitian of 
A h . A square matrix A is a Hermitian matrix if it equals its Hermitian: A = A 77 . The complex conjugate and 
Hermitian operators can also be applied to vectors. In particular, the complex conjugate of a vector or row vector 
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x, denoted as x*, is obtained by taking the complex conjugate of each element of x. The Hermitian of a vector x, 
denoted as x H , is its conjugate transpose: x 77 = (x*) T . 

Two A x M matrices can be added together to form a new matrix of size A x M . The addition is done 
element-by-element. In other words, if two N x M matrices A and B are added, the resulting A X M matrix 
C = A + B has ijth element C = A tJ + B tj . Since matrix addition is done element-by-element, it inherits the 
commutative and associative properties of addition, i.e. A + B = B + A, and (A + B) + C = A + (B + C). The 
transpose of a sum of matrices is the sum of the transposes of the individual matrices: (A + B) T = A 1 + B 7 . 
Matrix subtraction is similar: for two Nx M matrices A and B, C = A — B is an N x M matrix with ijth element 
C ij = A ij — B ij. Two row vectors or vectors of the same dimension can be added using this definition of matrix 
addition since these vectors are special cases of matrices. In particular, an TV-dimensional vector x can be added 
to another vector y of the same dimension to form the new A-dimensional vector z = x + y with ith element 
z i = Xj + y,. Similarly, if x and y are row vectors of dimension A, their sum z = x + y is an A-dimensional row 
vector with ith element z, = x; + y,. However, a row vector of dimension N > 1 cannot be added to a vector of 
dimension N, since these vectors are matrices of different sizes (1 x N for the row vector, N x 1 for the vector). 
The lineal - combination of vectors x and y of dimension N yields a new A-dimensional vector z = ex + dy with 
ith element z, = ex* + dy,;, where c and d are arbitrary scalars. Similarly, row vectors x and y of dimension N 
can be linearly combined to form the A-dimensional row vector z = cx + dy with ith element z , = ex, + dy , for 
arbitrary scalars c and d. 

A matrix can be multiplied by a scalar, in which case every element of the matrix is multiplied by the scalar. 
Specifically, multiplication of the matrix A by a scalar k results in the matrix k:A given by 
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A row vector x multiplied by scalar k yields fcx = [fcx i ... /cx^r], and a vector x multiplied by scalar k yields 
fcx = [fcxi . . . fcxjv] T . 

Two matrices can be multiplied together provided they have compatible dimensions. In particular, matrices A 
and B can be multiplied if the number of columns of A equals the number of rows of B. If A is an N x M matrix 
and B is a M x L matrix then their product C = AB is an N x L matrix with ijth element C ^ = ^2k=i A,/, B/,,-. 
Matrix multiplication is not commutative in general, i.e. in general AB / BA. In fact, if A is an N x M matrix 
and B is a M x L matrix then the product BA only exists if L = N. In this case BA is an M x M matrix, which 
may be a different size than the N x L matrix AB. Even if M = L = N, so that AB and BA are the same size, 
they may not be equal. If A is a square matrix then we can multiply A by itself. In particular', we define A 2 = A A. 
Similarly A k = A ... A is the product of k copies of A. This implies that A k A l = A k+l . Multiplication of any 
matrix by the identity matrix of compatible size results in the same matrix, i.e. if A is an A x M matrix, then 
I\A = AI m = A. The transpose of a matrix product is the product of the transpose of the individual matrices in 
reverse order: (AB) 2 = B 7 A 1 . The product of an A x M matrix A and its M x A Hermitian A H is a square 
matrix. In particular, AA /7 is an A x A square matrix while A H A is an M x M square matrix. The Frobenius 
norm of a matrix A, denoted as 1 1 A 1 1 p , is defined as 1 1 A 1 1 p = y^TijAA 72 ] = y^TrfA 77 A] = YliLi YljL i I A ^ | 2 . 
Matrix multiplication is associative, i.e. (AB)C = A(BC) as long as the matrix dimensions are compatible for 
multiplication, so the parentheses are typically omitted. Matrix multiplication is also distributive: A(B + C) = 
AB + AC and (A + B)C = AC + BC. 

An M -dimensional vector can be multiplied by a matrix with M columns. Specifically, if A is an A x M 
matrix and x is an M -dimensional vector (i.e. an M x 1 matrix) then their product yields an A-dimensional vector 
y = Ax with ith element y , = Y^k=\ A,/,.x/, : . Note that a matrix must left-multiply a vector, since the dimensions 
are not compatible for the product xA. However, if x is an A-dimensional row vector, then xA is a compatible 
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multiplication for A an TV x M matrix, and results in the M -dimensional row vector y = x A with Tth element 
y i = x/,. A An TV-dimensional row vector x can be multiplied by an TV-dimensional vector y, which 

results in a scalar 2 = xy = YliLi x z y i - Note that the transpose of an TV-dimensional vector is an TV-dimensional 
row vector. The inner product of two TV-dimensional vectors x and y is defined as < x, y >= x 7 y = . V ( x,y ( . 

Given a matrix A, a subset of rows of A form a linearly independent set if any row in the subset is not 
equal to a linear combination of the other rows in the subset. Similarly, a subset of columns of A form a linearly 
independent set if any column in the subset is not equal to a linear combination of the other columns in the subset. 
The rank Ra of a matrix A is equal to the number of rows in the largest subset of linearly independent rows of A, 
which can be shown to equal the number of columns in the largest subset of linearly independent columns of A. 
This implies that the rank of an TV x M matrix cannot exceed min [TV, M], An TV x M matrix A is full rank if 
Ra = min [TV, M\. 

The determinant of a 2 x 2 matrix A is defined as det[A] = An A 22 — A 21 A 12 . For an TV x TV matrix A 
that is larger than 2x2, det[A] is defined recursively as 

N 

det[A] = ^2 MjCij (C.9) 

i= 1 

for any j : 1 < j < TV, where C{j is the co-factor corresponding to the matrix element A y -, defined as 

cij = (-l) i+J 'det[A'], (C.10) 

where A' is the submatrix of A obtained by deleting the Tth row and jth column of A. 

If A is an TV x TV square matrix, and there is another TV x TV matrix B such that BA = Ijv, then we 
say that A is invertible or nonsingular. We call B the inverse of A, and we denote this inverse as A -1 . Thus, 
A _1 A = I/v- Moreover, for A -1 defined in this way, we also have that AA 1 = Ijy. Only square matrices 
can be invertible, and the matrix inverse is the same size as the original matrix. A square invertible matrix U is 
unitary if UU fl = I, which implies that U // = U” 1 and thus U 7/ U = I. Not every square matrix is invertible. 
If a matrix is not invertible, we say it is singular or noninvertible. The inverse of an inverse matrix is the original 
matrix: (A 1 )~ 1 = A. The inverse of the product of matrices is the product of the inverses in opposite order: 
(AB) _1 = B 1 A -1 . The feth power of the inverse is A~ k = (A' 1 )* 1 . 

For a diagonal matrix D = diagfdi, . . . , ci/v] with di f 0, i = 1, ... TV the inverse exists and is given by 
D 1 = diagfl j d \ . . . . , l/djv]- For a general 2x2 matrix A with ijth element aij, its inverse exists if det[A] / 0 
and is given by 

r - 1-1 1 r 1 

A _1 = Ql1 ° 12 = 1 a 22 ~ a V2 (Q 

021 o 22 det[A] —021 On 

There arc more complicated formulas for the inverse of invertible matrices with size greater than 2x2. However, 
matrix inverses are usually obtained using computer math packages. 

Matrix inverses arc commonly used to solve systems of linear equations. In particular, consider a set of linear 
equations, expressed in matrix form as 

y = Ax. (C.12) 

If the matrix A is invertible then, given y, there is a unique vector x = A _1 y that satisfies this system of equations. 

C.3 Matrix Decompositions 

Given a square matrix A, a scalar value A for which there exists a nonzero vector x such that Ax = Ax is called 
an eigenvalue of A. The vector x is called the eigenvector of A corresponding to A. The eigenvalues of a matrix 
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A arc all values of A that satisfy the characteristic equation of A, defined as det[A — AI]=0. The polynomial in 
A defined by det[A — AI] is called the characteristic polynomial of A, so the eigenvalues of A arc the roots of 
its characteristic polynomial. The characteristic polynomial of an A x A matrix has A unique roots r i, . . . , r ; .y, 
r,; / Tj if it is of the form det[A — AI] = (A — ri) ... (A — r at). When the characteristic polynomial includes a 
term (A — r, ) k , k > 1 we say that root r t has multiplicity k. For example, if det[A — AI] = (A — ?’i) 2 (A — r 2 ) 3 
then root n has multiplicity 2 and root r 2 has multiplicity 3. An A x A matrix has A eigenvalues Ai, . . . , Ajv, 
although they will not all be unique if any of the roots of the characteristic polynomial have multiplicity greater 
than 1. It can be shown that the determinant of a matrix equals the product of all its eigenvalues (i.e. an eigenvalue 
r % with multiplicity k would contribute /■]'" to the product). 

The eigenvalues of a Hermitian matrix arc always real, although the eigenvectors can be complex. Moreover, 
if A is an A x A Hermitian matrix then it can be written in the following form: 

A = PAP H , (C.13) 

where A = diag[Ai , . . . . Xf(. 0. .... 0] is an A x A diagonal matrix whose first K diagonal elements arc the 
nonzero (real) eigenvalues of A. We say that a matrix A is positive definite if for all nonzero vectors x, x H Ax > 0. 
A Hermitian matrix is positive definite if and only if all its eigenvalues arc positive. Similarly, we say the matrix 
A is positive semi-definite or non-negative definite if for all nonzero vectors x, x /y Ax > 0. A Hermitian matrix is 
non-negative definite if and only if all of its eigenvalues arc non- negative. 

Suppose that A is an A x M matrix of rank R/\. Then there is an A x M matrix £ and two unitary matrices 
U and V of size A x A and M x M , respectively, such that 

A = U£V // . (C.14) 



We call the columns of V the right eigenvectors of A and the columns of U the left eigenvectors of A. The matrix 
£ has a special form: all elements that arc not diagonal elements arc zero, so 



for A > M, and 
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NxM = 



(C-16) 



[0 ••• a M 0 ... 0J 

for A < M, where at = \f\, for A, the / tli eigenvalue of AA H . The values of a, arc called the singular 
values of A, and Ra of these singular values arc nonzero. The decomposition (C. 14) is called the singular value 
decomposition of A. The singular values of a matrix arc always non-negative. 

Let A be an A x M matrix where we denote its / th column as A,. Treating each column as a submatrix, 
we can write A = [Ai A 2 . . . Am]. The vectorization of the matrix A, denoted as vec(A), is defined as the 
AM -dimensional vector that results from stacking the columns A ; , i = 1, . . . , A of matrix A on top of each other 
to form a vector: 
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Let A be an A r x M matrix and B he an L x K matrix. The Kronecker product of A and B, denoted A (g) B, is a 
NL x MK matrix defined by 



A ( 8 ) B 
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AjvxB • • • A nm B 



(C.18) 
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Appendix D 

Summary of Wireless Standards 



This chapter summarizes the technical details associated with the two most prevalent wireless systems in operation 
today: cellular phones and wireless LANs. It also summarizes the specifications for three short range wireless 
network standards that have emerged to support a broad range of applications. 

D.l Cellular Phone Standards 

D.l.l First Generation Analog Systems 

In this section we summarize cellular phone standards. We begin with the standards for first-generation (1G) analog 
cellular phones, whose main characteristics arc summarized in Table D.l. Systems based on these standards were 
widely deployed in the 1980s. While many of these systems have been replaced by digital cellular systems, there 
are many places throughout the world where these analog systems arc still in use. The best known standard is the 
Advanced Mobile Phone System (AMPS), developed by Bell Labs in the 1970s, and first used commercially in the 
US in 1983. After its US deployment, many other countries adopted it as well. AMPS has a narrowband version, 
narrowband AMPS (N-AMPS), with voice channels that arc one third the bandwidth of regular AMPS. Japan 
deployed the first commercial cellular phone system in 1979 with the NTT (MCS-L1) standard based on AMPS, 
but at a higher frequency and with voice channels of slightly lower bandwidth. Europe also developed a similar 
standard to AMPS called the Total Access Communication System (TACS). TACS operates at a higher frequency 
and with lower bandwidth channels than AMPS. It was deployed in the U.K. and in other European coutries as 
well as outside Europe. The frequency range for TACS was extended in the U.K. to obtain more channels , leading 
to a variation called ETACS. A variation of the TACS system called JTACS was deployed in metropolitan areas 
of Japan in 1989 to provide higher capacity than the NTT system. JTACS operates at a slightly higher frequency 
than TACS and ETACS, and has a bandwidth-efficient version called NTACS, where voice channels occupy half 
the bandwidth of the channnels in JTACS. In addition to TACS, countries in Europe had different incompatible 
standards at different frequencies for analog cellular, including the Nordic Mobile Telephone (NMT) standard in 
Scandanavia, the Radiocom 2000 (RC2000) standard in France, and the C-450 standard in Germany and Portugal. 
The incompatibilities made it impossible to roam between European countries with a single analog phone, which 
motivated the need for one unified cellular standard and frequency allocation throughout Europe. 

D.1.2 Second Generation Digital Systems 

Next we consider second-generation (2G) digital cellular phone standards, whose main characteristics arc sum- 
marized in Table D.2. These systems were mostly deployed in the early 1990s. Due to incompatibilities in the 
first-generation analog systems, in 1982 the Groupe Special Mobile (GSM) was formed to develop a uniform 
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AMPS 


TACS 


NMT (450/900) 


NTT 


C-450 


RC2000 


Uplink Frequencies (MHz) 


824-849 


890-915 


453-458/890-915 


925-940 1 


450-455.74 


414.8-418 2 


Downlink Frequencies (MHz) 


869-894 


935-960 


463-468/935-960 


870-885 


460-465.74 


424.8-428 


Modulation 


FM 


FM 


FM 


FM 


FM 


FM 


Channel Spacing (KHz) 


30 


25 


25/12.5 


25 


10 


12.5 


Number of Channels 


832 


1000 


180/1999 


600 


573 


256 


Multiple Access 


FDMA 


FDMA 


FDMA 


FDMA 


FDMA 


FDMA 



Table D. 1 : First-Generation Analog Cellular Phone Standards 



digital cellular standard for all of Europe. The TACS spectrum in the 900 MHz band was allocated for GSM op- 
eration across Europe to facilitate roaming between countries. In 1989 the GSM specification was finalized and 
the system was launched in 1991, although availability was limited until 1992. The GSM standard uses TDMA 
combined with slow frequency hopping to combat out-of-cell interference. Convolutional coding and parity check 
codes along with interleaving is used for error correction and detection. The standard also includes an equalizer 
to compensate for frequency-selective fading. The GSM standard is used in about 66 % of the world’s cellphones, 
with more than 470 GSM operators in 172 countries supporting over a billion users. As the GSM standard became 
more global, the meaning of the acronym was changed to the Global System for Mobile Communications. 

Although Europe got an early jump on developing 2G digital systems, the US was not far behind. In 1992 the 
IS-54 digital cellular standard was finalized, with commercial deployment beginning in 1994. This standard uses 
the same channel spacing, 30 KHz, as AMPS to facilitate the analog to digital transition for wireless operators, 
along with a TDMA multiple access scheme to improve handoff and control signaling over analog FDMA. The 
IS-54 standard, also called the North American Digital Cellular standard, was improved over time and these im- 
provements evolved into the IS- 136 standard, which subsumed the original standard. Si mi lar to the GSM standard, 
the IS- 1 36 standard uses parity check codes, convolutional codes, interleaving, and equalization. 

A competing standard for 2G systems based on CDMA was proposed by Qualcomm in the early 1990s. The 
standard, called IS-95 or IS-95a, was finalized in 1993 and deployed commercially under the name cdmaOne in 
1995. Like IS- 136, IS-95 was designed to be compatible with AMPS so that the two systems could coexist in 
the same frequency band. In CDMA all users arc superimposed on top of each other with spreading codes that 
can separate out the users at the receiver. Thus, channel data rate does not apply to just one user, as in TDMA 
systems. The channel chip rate is 1.2288 Mchips/s for a total spreading factor of 128 for both the uplink and 
downlink. The spreading process in IS-95 is different for the downlink (DL) and the uplink (UL), with spreading 
on both links accomplished through a combination of spread spectrum modulation and coding. On the downlink 
data is first rate 1/2 convolutionally encoded and interleaved, then modulated by one of 64 orthogonal spreading 
sequences (Walsh functions). Then a synchronized scrambling sequence unique to each cell is superimposed on top 
of the Walsh function to reduce interference between cells. The scrambling requires synchronization between base 
stations. Uplink spreading is accomplished using a combination of a rate 1/3 convolutional code with interleaving, 
modulation by an orthogonal Walsh function, and modulation by a nonorthogonal user/base station specific code. 
The IS-95 standard includes a parity check code for error detection, as well as power control for the reverse link to 
avoid the near-far problem. A 3-finger RAKE receiver is also specified to provide diversity and compensate for 1ST 
A form of base station diversity called soft handoff (SHO), whereby a mobile maintains a connection to both the 
new and old base stations during handoff and combines their signals, is also included in the standard. CDMA has 
some advantages over TDMA for cellular systems, including no need for frequency planning, SHO capabilities, 

*NTT also operated in several other frequency bands around 900 MHz. 

2 RC2000 also operated in several other frequency bands around 200 MHz. 
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the ability to exploit voice activity to increase capacity, and no hard limit on the number of users that can be 
accommodated in the system. There was much debate about the relative merits of the IS-54 and IS-95 standards 
throughout the early 1990s, with claims that IS-95 could achieve 20 times the capacity of AMPS whereas IS-54 
could only achieve 3 times this capacity. In the end, both systems turned out to achieve approximately the same 
capacity increase over AMPS. 

The 2G digital cellular standard in Japan, called the Personal Digital Cellular (PDC) standard, was established 
in 1991 and deployed in 1994. It is similar to the IS- 136 standard, but with 25 KHz voice channels to be compatible 
with the Japanese analog systems. This system operates in both the 900 MHz and 1500 MHz frequency bands. 
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7.95 
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Table D.2: Second-Generation Digital Cellular Phone Standards 



D.1.3 Evolution of 2G Systems 

In the late 1990s 2G systems evolved in two directions: they were ported to higher frequencies as more cellular 
bandwidth became available in Europe and the US, and they were modified to support data services in addition to 
voice. Specifically, in 1994 the FCC began auctioning spectrum in the Personal Communication Systems (PCS) 
band at 1.9 GHz for cellular systems. Operators purchasing spectrum in this band could adopt any standard. 
Different operators chose different standards, so GSM, IS- 1 36, and IS-95 were all deployed at 1900 MHz in 
different parts of the country, making nationwide roaming with a single phone difficult. In fact, many of the initial 
digital cellphones included an analog AMPS mode in case the digital system was not available. GSM systems 
operating in the PCS band arc sometimes refered to as PCS 1900 systems. The IS- 136 and IS-95 (cdmaOne) 
standards translated to the PCS band go by the same names. Europe allocated additional cellular spectrum in the 
1.8 GHz band. The standard for this frequency band, called GSM 1800 or DCS 1800 (for Digital Cellular System), 
uses GSM as the core standard with some modifications to allow overlays of macrocells and microcells. Note that 
second-generation cordless phones such as DECT, the Personal Access Communications System (PACS), and the 
Personal Handyphone System (PHS) also operate in the 1.9 GHz frequency band, but these systems are mostly 
within buildings supporting private branch exchange (PBX) services. 

Once digital cellular became available, operators began incorporating data services in addition to voice. The 
2G systems with added data capabilities arc sometimes refered to as 2.5G systems. The enhancements to 2G 
systems made to support data services arc summarized in Table D.3. GSM systems followed several different 
upgrade paths to provide data services. The simplest, called High Speed Circuit Switched Data (HSCSD), allows 
up to 4 consecutive timeslots to be assigned to a single user, thereby providing a maximum transmission rate of up 
to 57.6 Kbps. Circuit switching is quite inefficient for data, so a more complex enhancement provides for packet- 
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switched data layered on top of the circuit-switched voice. This enhancement is refered to as General Packet Radio 
Service (GPRS). A maximum data rate of 17 1.2 Kbps is possible with GPRS when all 8 timeslots of a GSM frame 
are allocated to a single user. The data rates of GPRS are further enhanced through variable-rate modulation and 
coding, refered to as Enhanced Data rates for GSM Evolution (EDGE). EDGE provides data rates up to 384 Kbps 
with a bit rate of 48-69.2 Kbps per timeslot. GPRS and EDGE arc compatible with IS- 136 as well as GSM, and 
thus provide a convergent upgrade path for both of these systems. 

The IS-95 standard was modified to provide data services by assigning multiple orthogonal Walsh functions to 
a single user. A maximum of 8 functions can be assigned, leading to a maximum data rate of 1 15.2 Kbps, although 
in practice only about 64 Kbps is achieved. This evolution is refered to as the IS-95b standard. 



2G Standard 
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GSM/IS-136 
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384/200 Kbps 
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Table D.3: 2G Enhancements to Support 2.5G Data Capabilities 



D.1.4 Third Generation Systems 

The fragmentation of standards and frequency bands associated with 2G systems led the International Telecom- 
munications Union (ITU) in the late 1990s to formulate a plan for a single global frequency band and standard for 
third-generation (3G) digital cellular systems. The standard was named the International Mobile Telephone 2000 
(IMT-2000) standard with a desired system rollout in the 2000 timeframe. In addition to voice services, IMT-2000 
was to provide Mbps data rates for demanding applications such as broadband Internet access, interactive gaming, 
and high quality audio and video entertainment. Agreement on a single standard did not materialize, with most 
countries supporting one of two competing standards: cdma2000 (backward compatible with cdmaOne) supported 
by the Third Generation Partnership Project 2 (3GPP2) and wideband CDMA (W-CDMA, backward compatible 
with GSM and IS- 136) supported by the Third Generation Partnership Project 1 (3GPP1). The main characteristics 
of these two 3G standards arc summarized in Table D.4. Both standards use CDMA with power control and RAKE 
receivers, but the chip rates and other specification details arc different. In particular. cdma2000 and W-CDMA 
arc not compatible standards, so a phone must be dual-mode to operate with both systems. A third 3G standard, 
TD-SCDMA, is under consideration in China but is unlikely to be adopted elsewhere. The key difference between 
TD-SCDMA and the other 3G standards is its use of TDD instead of FDD for uplink/downlink signaling. 

The cdma2000 standard builds on cdmaOne to provide an evolutionary path to 3G. The core of the cdma2000 
standard is refered to cdma2000 IX or cdma2000 1XRTT, indicating that the radio transmission technology (RTT) 
operates in one pair of 1.25 MHz radio channels, and is thus backwards compatible with cdmaOne systems. The 
cdma2000 IX system doubles the voice capacity of cdmaOne systems and provides high-speed data services with 
projected peak rates of around 300 Kbps, with actual rates of around 144 Kbps. There arc two evolutions of this 
core technology to provide high data rates (HDR) above 1 Mbps: these evolutions are refered to as cdma2000 
1XEV. The first phase of evolution, cdma2000 1XEV-DO (Data Only), enhances the cdmaOne system using a 
separate 1.25 MHz dedicated high-speed data channel that supports downlink data rates up to 3 Mbps and uplink 
data rates up to 1.8 Mbps for an averaged combined rate of 2.4 Mbps. The second phase of the evolution, cdma2000 
1XEV-DV (Data and Voice), is projected to support up to 4.8 Mbps data rates as well as legacy IX voice users, 
1XRTT data users, and 1XEV-DO data users, all within the same radio channel. Another proposed enhancement 
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to cdma2000 is to aggregate three 1.25 MHz channel into one 3.75 MHz channel. This aggregation is refered to as 
cdma2000 3X, and its exact specifications arc still under development. 

W-CDMA is the primary competing 3G standard to cdma2000. It has been selected as the 3G successor to 
GSM, and in this context is refered to as the Universal Mobile Telecommunications System (UMTS). W-CDMA 
is also used in the Japanese FOMA and J-Phone 3G systems. These different systems share the W-CDMA link 
layer protocol (air interface) but have different protocols for other aspects of the system such as routing and speech 
compression. W-CDMA supports peak rates of up to 2.4 Mbps, with typical rates anticipated in the 384 Kbps 
range. W-CDMA uses 5 MHz channels, in contrast to the 1.25 MHz channels of cdma2000. An enhancement to 
W-CDMA called High Speed Data Packet Access (HSDPC) provides data rates of around 9 Mbps, and this may 
be the precursor to 4th-generation systems. The main characteristics of the 3G cellular standards are summarized 
in Table D.4. 



3G Standard 


cdma2000 


W-CDMA 


Subclass 


IX 


1XEV-DO 


1XEV-DV 


3X 


UMTS FOMA J-Phone 


Channel Bandwidth (MHz) 


1.25 


1.25 


3.75 


5 


Chip Rate (Mchips/s) 


1.2288 


3.6864 


3.84 


Peak Data Rate (Mbps) 


.144 


2.4 


4.8 


5-8 


2.4 (8-10 with HSDPA) 


Modulation 


QPSK (DL), BPSK (UL) 


Coding 


Convolutional (low rate), Turbo (high rate) 


Power Control 


800 Hz 


1500 Hz 



Table D.4: Third-Generation Digital Cellular Phone Standards 



D.2 Wireless Local Area Networks 

Wireless local area networks (WLANs) arc built around the family of IEEE 802.11 standards. The main charac- 
teristics of this standards family are summarized in Table D.5. The baseline 802.11 standard, released in 1997, 
occupies 83.5 MHz of bandwidth in the unlicensed 2.4 GHz frequency band. It specifies PSK modulation with 
FHSS or DSSS. Data rates up to 2 Mbps are supported, with CSMA/CA used for random access. The baseline 
standard was expanded in 1999 to create the 802. 1 lb standard, operating in the same 2.4 GHz band using only 
DSSS. This standard uses variable-rate modulation and coding, with BPSK or QPSK for modulation and channel 
coding via either Barker sequences or Complementary Code Keying (CCK). This leads to a maximum channel rate 
of 11 Mbps, with a maximum user data rate of around 1.6 Mbps. The transmission range is approximately 100 
m. The network architecture in 802.1 lb is specified as either star or peer-to-peer, although the peer-to-peer feature 
is not typically used. This standard has been widely deployed and used, with manufacturers integrating 802.11b 
wireless LAN cards into many laptop computers. 

The 802.11a standard was finalized in 1999 as an extension to 802.11 to improve on the 802.11b data rates. 
The 802.1 la standard occupies 300 MHz of spectrum in the 5 GHz Nil band. In fact, the 300 MHz of bandwidth is 
segmented into three 100 MHz subbands: a lower band from 5.15-5.25 GHz, a middle band from 5.25-5.35 GHz, 
and an upper band from 5.725-5.825 GHz. Channels arc spaced 20 MHz apart, except on the outer edges of the 
lower and middle bands, where they are spaced 30 MHz apart. Three maximum transmit power levels arc specified: 
40 mW for the lower band, 200 mW for the middle band, and 800 mW for the upper band. These restrictions imply 
that the lower band is mostly just suitable for indoor applications, the middle band for indoor and outdoor, and the 
high band for outdoor. Variable-rate modulation and coding is used on each channel: the modulation varies over 
BPSK, QPSK, 16QAM, and 64QAM, and the convolutional code rate varies over 1/2, 2/3, and 3/4. This leads 
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to a maximum data rate per channel of 54 Mbps. For indoor systems, the 5 GHz carrier coupled with the power 
restriction in the lower band reduces the range of 802.11a relative to 802.11b, and also makes it more difficult 
for the signal to penetrate walls and other obstructions. 802.11a uses orthogonal frequency division multiplexing 
(OFDM) multiple access instead of FHSS or DSSS, and in that sense diverges from the original 802.1 1 standard. 

The 802. 1 lg standard, finalized in 2003, attempts to combine the best of 802. 11a and 802. 1 lb, with data rates 
of up to 54 Mbps in the 2.5 GHz band for greater range. The standard is backwards compatible with 802.1 lb so 
that 802. 1 lg access points will work with 802.11b wireless network adapters and vice versa. However, 802. 1 lg 
uses the OFDM, modulation, and coding schemes of 802.11a. Both access points and wireless LAN cards arc 
available with all three standards to avoid incompatibilities. The 802.1 la/b/g family of standards arc collectively 
refered to as Wi-Fi, for wireless fidelity. Extending these standards to frequency allocations in countries other than 
the US falls under the 802. lid standard. There arc several other standards in the 802.11 family that arc under 
development: these are summarized in Table D.6. 

A potential competitor to the 802.11 standards as well as cellular systems is the emerging IEEE 802.16 
standard called WiMAX. This standard promises broadband wireless access with data rates on the order of 40 
Mbps for fixed users and 15 Mbps for mobile users, with a range of several kilometers. Details of the specification 
arc still being worked out. 





802.11 


802.11a 


802.11b 


802.1 lg 


Bandwidth (MHz) 


300 


83.5 


83.5 


83.5 


Frequency Range (GHz) 


2.4-2.4835 


5.15-5.25 (lower) 
5.25-5.35 (middle) 
5.725-5.825 (upper) 


2.4-2.4835 


2.4-2.4835 


Number of Channels 


3 


12 (4 per subband) 


3 


3 


Modulation 


BPSK.QPSK 

DSSS.FHSS 


BPSK, QPSK, MQAM 
OFDM 


BPSK,QPSK 

DSSS 


BPSK, QPSK, MQAM 
OFDM 


Coding 




Conv. (rate 1/2, 2/3, 3/4) 


Barker, CCK 


Conv. (rate 1/2, 2/3, 3/4) 


Max. Data Rate (Mbps) 


1.2 


54 


11 


54 


Range (m) 




27-30 (lower band) 


75-100 


30 


Random Access 


CSMA/CA 



Table D.5: 802. 1 1 Wireless LAN Link Layer Standards 



D.3 Wireless Short-Distance Networking Standards 

This last section summarizes the main characteristics of Zigbee, Bluetooth, and UWB, which have emerged to 
support a wide range of short distance wireless network applications. These specifications arc designed to be 
compliant with the IEEE 802.15 standards, a family of IEEE standards for short distance wireless networking 
called Wireless Personal Area Networks (WPANs). Bluetooth operates in the 2.4 GHz unlicensed band, Zigbee 
operates in the same band as well as in the 800 MHz and 900 MHz unlicensed bands, and UWB operates across 
a broad range of frequencies in an underlay to existing systems. Zigbee and Bluetooth include link, MAC, and 
higher layer protocols specifications, whereas UWB specificies just the link layer protocol. Table D.7 su mm arizes 
the main characteristics of Zigbee (2.4 GHz band only), Bluetooth, and UWB. 

Zigbee consists of link and MAC layer protocols that are compliant with the IEEE 802.15.4 standard, as well 
as higher layer protocols for ad-hoc networking (mesh, star, or tree topologies), power management, and security. 
Zigbee supports data rates up to 250 Kbps with PSK modulation and DSSS. Zigbee generally targets applications 
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Standard 


Scope 


802. lie 


Provides Quality of Service (QoS) at the MAC layer 


802. Ilf 


Roaming protocol across multivendor access points 


802. llh 


Adds frequency and power management features to 802.1 la 
to make it more compatible with European operation 


802. lli 


Enhances security and authetication mechanisms 


802. llj 


Modifies 802. 11a link layer to meet Japanese requirements 


802.11k 


Provides an interface to higher layers for radio and network 
measurements which can be used for radio resource management. 


802.11m 


Maintenance of 802. 1 1 standard (technical/editorial corrections) 


802.1 In 


MIMO link enhancements to enable higher throughput 



Table D.6: IEEE 802.11 Ongoing Standards Work 



requiring relatively low data rates, low duty cycles, and large networks. Power efficiency is key, with the goal of 
nodes operating for months or years on a single battery charge. 

In contrast to Zigbee, Bluetooth provides up to 1 Mbps data rate, including three guaranteed low latency 
voice channels, using GFSK modulation and FHSS. Bluetooth normally transmits at a power of 1 mW with a 
transmission range of 10 m, although this can be extended to 100 m by increasing the transmit power to 100 mW. 
Networks arc formed in subnet clusters (piconets) of up to 8 nodes, with one node acting as a master and the rest 
as slaves. TD is used for channel access, with the master node coordinating the FH sequence and synchronization 
with the slave nodes. Extended networks, or scatternets, can be formed when one node is part of multiple piconets. 
However, forming large networks through this approach is difficult due to the synchronization requirements of 
FHSS. Portions of the Bluetooth standard were formally adopted by the IEEE as its 802.15.1 standard. 

UWB has significantly higher data rates, up to 100 Mbps, than either Zigbee or Bluetooth. It also occupies 
significantly more bandwidth, and has stringest power restrictions to prevent it from interfering with primary band 
users. Thus, it is only suitable for short-range indoor applications. UWB only defines a link layer technology, 
so it requires a compatible MAC protocol as well as higher layer protocols to become part of a wireless network 
standard. The modulation is BPSK or QPSK, with competing camps recommending either OFDM or DSSS over- 
layed on the data modulation. UWB is likely to become the link layer technology for the IEEE 802.15.3 standard, 
a family of standards for wireless networks supporting imaging and multimedia applications. 





Zigbee (802.15.4) 


Bluetooth (802.15.1) 


UWB (802.15.3 proposal) 


Frequency Range (GHz) 


2.4-2.4835 


2.4-2.4835 GHz 


3.1-10.6 


Bandwidth (MHz) 


83.5 


83.5 


7500 


Modulation 


BPSK.OQPSK 

DSSS 


GFSK 

FHSS 


BPSK,QPSK 
OFDM or DSSS 


Max. Data Rate (Mbps) 


.25 


1 


100 


Range (m) 


30 


10 


10 


Power Consumption (mW) 


5-20 


40-100 


80-150 mW 


Access 


CSMA/CA (optional TD) 


TD 


Undefined 


Networking 


Mesh/Star/Tree 


Subnet Clusters (8 nodes) 


Undefined 



Table D.7: Short-Range Wireless Network Standards 
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